Next Article in Journal
Nationwide, Operational Sentinel-1 Based InSAR Monitoring System in the Cloud for Strategic Water Facilities in Hungary
Previous Article in Journal
Machine Learning Techniques for Phenology Assessment of Sugarcane Using Conjunctive SAR and Optical Data
Previous Article in Special Issue
Estimating Tree Defects with Point Clouds Developed from Active and Passive Sensors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Google Earth Engine and Artificial Intelligence (AI): A Comprehensive Review

1
Department of Geography and Environmental Studies, University of New Mexico, Albuquerque, NM 87131, USA
2
Center for the Advancement of Spatial Informatics Research and Education (ASPIRE), University of New Mexico, Albuquerque, NM 87131, USA
3
Department of Computer Science, University of New Mexico, Albuquerque, NM 87106, USA
4
Department of Geography, University of Tennessee, Knoxville, TN 37996, USA
5
Interdisciplinary Science Co-Operative, University of New Mexico, Albuquerque, NM 87131, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(14), 3253; https://doi.org/10.3390/rs14143253
Submission received: 2 May 2022 / Revised: 28 June 2022 / Accepted: 2 July 2022 / Published: 6 July 2022
(This article belongs to the Special Issue The Future of Remote Sensing: Harnessing the Data Revolution)

Abstract

:
Remote sensing (RS) plays an important role gathering data in many critical domains (e.g., global climate change, risk assessment and vulnerability reduction of natural hazards, resilience of ecosystems, and urban planning). Retrieving, managing, and analyzing large amounts of RS imagery poses substantial challenges. Google Earth Engine (GEE) provides a scalable, cloud-based, geospatial retrieval and processing platform. GEE also provides access to the vast majority of freely available, public, multi-temporal RS data and offers free cloud-based computational power for geospatial data analysis. Artificial intelligence (AI) methods are a critical enabling technology to automating the interpretation of RS imagery, particularly on object-based domains, so the integration of AI methods into GEE represents a promising path towards operationalizing automated RS-based monitoring programs. In this article, we provide a systematic review of relevant literature to identify recent research that incorporates AI methods in GEE. We then discuss some of the major challenges of integrating GEE and AI and identify several priorities for future research. We developed an interactive web application designed to allow readers to intuitively and dynamically review the publications included in this literature review.

1. Introduction and Motivation

Big data approaches have been making substantial changes in science and in society at large [1,2]. Geospatial big data, which are collected with ubiquitous location-aware sensors that are inherently geospatial [3], are a significant portion of big data. The size of such data is growing rapidly, by at least 20% per year [4]. The United Nations Initiative on Global Geospatial Information Management (UN-GGIM) estimated that 2.5 quintillion bytes of data (one quintillion bytes = 1000 petabytes (PB); 1 PB = 1000 Terabytes (TB)) are being generated every single day, a large portion of which is location-aware. About 25 PB of data are being generated per day at Google, a significant portion of which is spatio-temporal data [4]. This trend will accelerate even faster as the world becomes more mobile and as unoccupied aircraft systems (UAS) and satellite imagery are acquired more often and at higher resolutions [5]. Along with this exponential increase in geospatial big data, the need for cloud computing and high-performance computing for modeling, analyzing, and simulating geospatial contents is also rapidly increasing [4]. Geospatial big data have recently gained attention from researchers and practitioners in geographic information science (GIScience) and remote sensing (RS) [6]. Efficient collection, management, storage, analysis, and visualization of big data have become critical for the development of intelligent decision systems and provide unprecedented opportunities for business, science, and engineering [7]. Handling the 5 “Vs” (volume, variety, velocity, veracity, and value [8]) of big data is still a very challenging task. This is even more challenging for RS imagery due to its large volume (i.e., high resolution and multiple bands) and long timespan; geospatial big data pose significant challenges to conventional geographic information systems (GIS) as well as RS approaches and platforms [9,10,11,12,13].
Geospatial big data, especially RS big data, have posed substantial challenges due to their large volume, high spatial-temporal resolution, and complexity. One of the very promising and practical solutions for analyzing RS big data is Google Earth Engine (GEE). GEE is a scalable, cloud-based geospatial retrieval and processing platform. It also provides access to the vast majority of freely available, public, multi-temporal RS data and offers free cloud-based computational power for geospatial data analysis [14,15,16]. More specifically, GEE provides free access to a multi-PB archive of geospatial datasets spanning over 40 years of historical and current Earth observation (EO) imagery, including satellite imagery (e.g., Sentinel from the European Space Agency (ESA), Landsat from the United States Geological Survey (USGS), Moderate Resolution Imaging Spectroradiometer (MODIS) from the National Aeronautics and Space Administration (NASA), the Cropland Data Layer (CDL) from the United States Department of Agriculture’s (USDA) and National Agricultural Statistics Service (NASS), and the National Agriculture Imagery Program (NAIP), also from the USDA), airborne imagery, weather and climate datasets, as well as digital elevation models (DEMs) [14,16]. Those RS data can be efficiently imported and processed on the cloud platform, avoiding the need to download data to local computers for processing [17]. Along with computing and storage resources, GEE also supports many RS algorithms (e.g., image enhancement, image classification, and cloud masking), which are readily accessible and customizable and allow data processing and visualization at different scales through JavaScript or Python Application Program Interfaces (APIs) [14,16,18,19]. These capabilities reduce most of the time-consuming preprocessing steps needed in traditional RS approaches. The computational power of GEE along with its comprehensive data catalog and data processing methods make GEE an ideal platform for solving geospatial big data problems. GEE allows researchers and practitioners to focus on developing and solving their domain problems by making it easier to retrieve data and algorithms and to compute all in one place. For example, the Landsat archive on GEE is already preprocessed for atmospheric and topographic effects—this saves researchers and practitioners a substantial amount of time and effort in terms of downloading and preprocessing data [16]. GEE, with free planetary-scale geospatial big data (solved the data availability, data storage, and data preprocessing challenges) and free computing resources, facilitates computationally cumbersome geospatial big data analysis for researchers and practitioners with minimal local computing and storage resources. GEE, in the parlance of the RS Communication Model, reduces the number of channels required to construct an RS system, and therefore the time required to go from query to result [20]. Researchers from a wide range of fields are able to generate multiscale (local, national, regional, continental, and global scale) insights that would have been nearly impossible without the geospatial big data and computing capacity available in GEE [21].
GEE provides the free cloud-computing platform to tackle geospatial big data challenges, and recent substantial advances in artificial intelligence (AI) can and will further elevate the power of GEE. We cover three of AI’s main subdisciplines in this paper: computer vision (CV), machine learning (ML) and its subdomain, deep learning (DL). These technologies are central to leveraging big data for applications in many domains and have achieved significant advances in a wide range of applications that have a high social impact, such as damage assessment and prediction of natural disasters (e.g., automatic flooding damage assessment [1] and wildfire prediction [22]) and healthcare [23,24,25]. Geospatial artificial intelligence (GeoAI) combines methods in spatial science (e.g., GIScience and RS), AI, data mining, and high-performance computing to extract meaningful knowledge from geospatial big data [26]. GeoAI stems from GIScience methods applied to RS data but has advanced the field of AI to solve geospatial-specific big data challenges and problems.
There are substantial separate bodies of research covering AI (especially CV, ML and DL) and GEE. However, much less research directly combines AI and GEE. Allowing researchers and practitioners to harness the power of both GEE and AI for their research and real-world problems is the core motivation for us to investigate a range of recent developments that combine GEE and AI. Thus, our paper can serve as an academic bridge for researchers and practitioners in GEE and AI, highlighting how scientists are using GEE and AI and in which domain areas. Researchers and practitioners in GEE and AI will gain strength from each other and thus make the science move forward more effectively and efficiently, making it possible to tackle global challenges such as those relevant to climate change.

1.1. Selection Criterion for Reviewed Papers and Brief Graphic Summary

There is a substantial body of work on GEE (e.g., see recent reviews [14,18,27,28,29]) and AI for RS (especially DL, ML and CV used in a RS setting, see recent reviews in [30,31]), respectively. However, much less research has gone into detailing the integration of GEE with AI. In the literature review process, we initially identified 500+ papers relevant to GEE. We then performed a systematic search based on the following strategies: (1) keyword search on Google Scholar: the keywords used for our literature search are “Google Earth Engine” AND “machine learning” OR “deep learning” OR “computer vision”; (2) references tracking: we went through the papers cited in recent GEE reviews ([14,18,27,28,29]) (i.e., the “References” list of the papers) and also tracked the last two years’ worth of new papers citing the existing GEE review papers on their Google Scholar page. Note that our search was restricted to research articles published in English and in peer-reviewed journals or conference proceedings. A total number of 200 highly relevant articles were identified by excluding the papers that purely use GEE for RS data download or those that do not use AI (including its branches CV, ML, DL). Figure 1 shows the spatial distribution and statistics summary of the papers covered in this review. The number of published papers by year (2015 to 2022) has dramatically increased since 2019. “Remote Sensing” and “Remote Sensing of Environment’’ are the leading journals where most GEE and AI papers are published. In addition, most first authors institutions are based in China and the United States. (Note that a freely accessible interactive version of the map and all charts throughout the paper can be accessed via our web app tool; the web app tool URL and its brief demo video are provided in Appendix A).

1.2. Roadmap

Here, we provide a roadmap to the rest of the paper. Section 2 outlines the scope of this review and our intended audience. Section 3 is the core of the paper, focused on identifying important and recent developments and their implications in terms of applications (Section 3.2) and novel methods (Section 3.3) that leverage GEE and AI. Section 3 covers a wide array of recent research combining GEE and AI from multiple domains with many cross-connections. The paper concludes in Section 4 with a discussion of key challenges and opportunities, from both application (Section 4.2) and technical (Section 4.3) perspectives. Specifically, we focus on the main challenges preventing GEE and AI integration, as well as some possible future research directions. To make the substantial number of papers we reviewed (200 total) more transparent and easier to retrieve and understand, we developed an interactive web tool (see Appendix A for details). As evaluation metrics are essential for measuring the performance of AI/ML/DL/CV models, we provide a set of commonly used evaluation metrics in Appendix B. To make the main text of the paper concise, each application area detailed in Section 3.2 contains a table and brief textual summary of the papers in that field. However, a more detailed and comprehensive summary for each section can be found in Appendix C for those that are interested. Lastly, as there are plenty of acronyms in this paper, we provide a full list of abbreviations right before the Appendix A, Appendix B and Appendix C.

2. Scope and Intended Audience

It is very challenging to repeatedly produce up-to-date and accurate maps and obtain up-to-date and accurate information, especially at large scales, for many important applications and monitoring systems due to time, effort, and cost. As larger volumes of geospatial data become available, an ever-increasing number of techniques for analyzing them have increased the number and scope of monitoring applications (e.g., global water mapping [32], forest and deforest monitoring [33], and global climate change research [34]).
Downloading, analyzing, and managing a multi-decadal time series of satellite imagery over large areas is not practical using desktop computing resources [35]. Complementing huge volumes of open-access satellite data are new technologies and services (e.g., cloud computing, AI) that are shifting the manner in which RS data are used for environmental monitoring. The technologies and platforms present opportunities for new advances in data collection for monitoring climate change mitigation, particularly where traditional means of data exploration and analysis, such as government-led statistical census efforts, are costly and time consuming [36]. GEE [37], launched on 2 December 2010, receives significant attention in the earth science community because it provides free and dedicated geospatial data resources and services. These services include RS imagery storage, preprocessing routines, and hosting AI algorithms all in one place. Similar cloud computing platforms and services that support geospatial data hosting and/or computation include the NASA Earth Exchange (NEX) [38] (2013) and Geostationary-NEX (GeoNEX) [39] (2020), Earth on Amazon Web Services (AWS) [40] (launched in September 2016), Microsoft’s Azure services (Geospatial Analytics in Azure Stream Analytics) [41] (launched in 2017) and Microsoft Planetary Computer [42] (launched in December 2020). GEE has a bigger community, more data, and more algorithms all in one place whereas these other systems came later and are not as robust. Building on the emerging needs of GEE and AI to process and analyze big RS data via cloud computing for many domain problems, like any review paper, one of our major goals is to survey recent work on GEE with AI to provide suggestions for new directions built upon these evolving methods. Another important goal of this paper is to provide a bridge between GEE and AI researchers and practitioners, especially those who have interdisciplinary backgrounds and expertise. It is our hope that this paper helps move towards a smoother and deeper integration of GEE with AI. As our comprehensive investigation shows, there are still several open challenges preventing researchers from using GEE and AI for their research (for example, the lack of options for those interested in using DL models on the platform, detailed further in Section 4.1.2).
This comprehensive review is relevant to any research, practice, and education domains that could take advantage of RS imagery coupled with AI and cloud computing, including, but not limited to RS, GIScience, earth science, geosciences, computer science, data science, information science, hydroinformatics, and image analysis. This paper does not attempt to review publications that use GEE without utilizing AI methods (for recent reviews of GEE, see [14,18,28,29]), nor to review DL with RS (recent reviews on DL with RS can be found in [30,31,43,44]). This review focuses on investigating recent GEE work that has integrated with AI, including its branches ML, DL, and CV, for a wide and various range of applications (e.g., crop, wetland, and water mapping, detailed in Section 3.2). From our review, only a small subset of papers contribute significantly towards implementing novel AI architecture or methods within GEE (detailed in Section 3.3).
To our knowledge, no review solely focuses on the combination of GEE and AI. Our review has a narrowed scope of GEE integrated with AI, but a wider and deeper scope in terms of AI methods and metrics applied to many domains on the GEE platform. Other GEE review papers have sections for ML models (“X papers use random forest (RF) models and Y use Support Vector Machines (SVMs) …”) as part of a general GEE review, but do not explicitly focus on the implementation of AI models in GEE. In addition, what significantly distinguishes our review from other GEE survey papers is an interactive web app, named iLit4GEE-AI (see Appendix A). We developed this app to allow our readers to intuitively retrieve relevant GEE with AI literature that fits their needs. For example, a user can very quickly filter out those published articles that used RF models and F1-score for wetland mapping. Most importantly, iLit4GEE-AI will serve as a live and interactive literature repository for integrating GEE with AI, as we will continue to update the data in the web app. In the future, we hope our web app will serve as an important and up-to-date resource for the GEE and AI research and practice community. Through our deep, thorough, and interactive investigation (see Appendix A for a visual, interactive investigation using our web app iLit4GEE-AI), we hope to develop a basis for a smoother and deeper integration of GEE and AI, which will help move many domains forward. Further, many of the domains presented in this paper (Section 3.2) are highly related, as different aspects of our environment are inherently linked. By aggregating research across domains and making it searchable and filterable, we hope to spur innovation, collaboration, and code sharing between researchers in the pursuit of tackling cross-disciplinary, complex issues such as those related to global warming. For example, water body identification, deforestation monitoring, and wildfire detection are all separate domains, but researchers and practitioners in different domains may use common data sources, processing methods, and algorithms in their final results. As we continue to compile papers written at the intersection of GEE and AI via our web app tool iLit4GEE-AI, it will become easier for researchers to find relevant literature and code resources even if they are from different areas of study.

3. The State of the Art: GEE with AI

In this section, we firstly provide an overview of the reviewed 200 studies using GEE and AI (Section 3.1). Then we investigate studies leveraging GEE with AI from the perspective of applications (Section 3.2) and highlight those with novel methods in Section 3.3.

3.1. Overview of the Reviewed Studies

In this paper, we have reviewed 200 papers. A word-cloud visualization of the titles and keywords of the reviewed 200 papers are provided in Figure 2. The word clouds provide an informative (general and specific) focus of those reviewed papers. The most frequently used words are “Google Earth Engine”, “classification”, “imagery”, “machine learning”, “mapping”, “remote sensing”, and “detection”. We can also see that there are specific keywords, such as “cloud”, “water”, “forest”, “crop”, “soil”, “fire” and “urban” reflecting many of the categories we identified for this review paper. Additionally, “Landsat”, “Sentinel-1”, and “Sentinel-2”, “SAR” (or synthetic aperture radar) as well as “China”, “Brazil”, and “Asia” detail some of the many study areas from these 200 papers. Note that we only included and reviewed the papers that integrate GEE and AI (including its branches ML/DL/CV).
Figure 3 shows that most published work leveraging the power of GEE integrated with AI is still at the application stage and that there is room to develop novel methods to advance earth observation in relevant fields. To break this down further, in (b) we can see that ML is the dominant method, and in (c) the most-used tasks are classification. In Figure 4, the primary applications that have applied GEE integrated with AI are crop, LULC, vegetation, wetland, water, and forest, and that primary study areas are China, Brazil, and the United States. The most-used RS data types are Landsat 8 OLI, and Sentinel-2. From Figure 5, we see that the most-used ML models are RF, SVM, and CART, while the top evaluation metrics used are: overall accuracy (OA), producer’s accuracy (PA), user’s accuracy (UA), and Kappa. (Note that a freely accessible interactive version of the map and charts can be accessed via our web app tool; the web app tool URL and its brief demo video are provided in Appendix A; also, in Appendix B, we provide our resources for an introduction of a list of commonly used evaluation metrics).

3.2. Advances in Applications

We organized the following subsections according to total citation count. Thus, readers will start in the thematic research area with the highest number of citations that use ML/DL/CV on GEE. As readers move through Section 3.2, they will then be covering topics with a less developed presence on GEE (that the authors are aware of) that also utilize ML/DL/CV. For each subsection, a table with information such as study area, RS data type, and what sort of ML/DL model or CV algorithm the authors used will accompany each reference. Note that each table in this section is ordered chronologically to show trends in data type and model usage. Each table will be accompanied by a word cloud showing terms from paper titles and keywords given by the authors. For Section 3.2.8, Section 3.2.9, Section 3.2.10, Section 3.2.11, Section 3.2.12, Section 3.2.13, Section 3.2.14, Section 3.2.15, Section 3.2.16, Section 3.2.17 and Section 3.2.18, there are not enough publications to make the word clouds informative, so in addition to titles and keywords we also include the abstract text. Below each table are accompanying summaries for each reference in the table. References with an “*” next to them denote that a specific paper used novel methods or went beyond a straight-forward application of data and methods on GEE. In this paper, we define a very narrow view of what a novel method is: using ML/DL/CV models and algorithms in new ways on GEE (see Section 3.3 for more details on novel GEE methods). This means that even if a paper combined data in a new way or formed a new data preprocessing method, their paper was deemed an application since our focus is on ML/DL/CV methods. Each paper we reviewed in Section 3.2 is grouped into a specific subsection, ranging from 2 total citations (Bathymetric mapping) to 37 citations (Crop mapping).
Note that papers could be divided up into several different sections. Thus, there is some subjectivity in this assignment of categories to different papers, so readers should be aware that there is overlap. Generally, most papers could be classified as LULC or “land use and land cover”. Agriculture, vegetation, water, and forests are all classes commonly classified in LULC analyses. Our intention was to represent the focus of a given paper. For example, if a paper predicted for several classes in their analysis but their focus was on producing deforestation maps, even for papers that had “LULC” in their title, then their paper will be under “Forest and deforestation monitoring”. Similarly, if a paper was creating vegetation maps of tidal flats, then this paper is under “Vegetation mapping” and not “Wetland mapping”. As another example, if authors were monitoring vegetation or water indices in RS imagery but their goal was to monitor reclamation progress or pollution levels at mining sites, their papers would be found under “Heavy industry and pollution monitoring”. Only in the case where the goal was to expressly create a general LULC map would that paper go under “Land cover classification”.

3.2.1. Crop Mapping

Crop mapping is the most-well-developed application using GEE and AI (37 studies). Table 1 below summarizes those studies and a word cloud generated from the titles and keywords of the 37 studies is provided in Figure 6. The most frequently used words are “Google Earth Engine”, “mapping”, and “classification”. However, the term “yield” also reflects that regression tasks such as yield prediction are almost as common as classification tasks such as producing maps. Additionally, specific words such as “30-m” and “Asia” indicate the spatial resolution and coverage of the RS imagery used in the reviewed papers. From our interactive web app (see Appendix A) and Table 1, Landsat 8 Operational Land Imager (OLI), Shuttle Radar Topography Mission (SRTM) DEM, and Sentinel-2 are the most-used RS data. The most popular AI models are RF, SVM, Classification and Regression Trees (CART), and k-means models; and the most-used evaluation metrics are user’s accuracy (UA), producer’s accuracy (PA), Kappa, and R2. A brief summary of those studies is provided right below Table 1. More detailed textual summaries for most of the reviewed crop mapping studies are provided in Appendix C.1.
Creating country-level, crop-specific maps using RS data can be difficult because of the large amount of data involved. GEE provides data storage and online processing capabilities, greatly ameliorating the issues with downloading data and managing computing resources. Several algorithms were compared in [46] on GEE, CART, IKPamir, logistic regression, a MLP, NB, RF, and an SVM, for crop-type classification in Ukraine. The authors also used an ensemble NN but had to move off the GEE platform since NNs were not currently supported. It is often difficult to map croplands on a large scale using RS imagery because of a lack of ground-truth validation data. However, there are also problems relating to differing cultivation techniques and definitions of what makes up cropland. To address these issues, the authors in [67] collected large amounts of training points from Google Earth imagery and analyzed Landsat and DEM data to create a cropland data layer across Europe, the Middle East, and Russia. Accurate classification and mapping of crops is essential for supporting sustainable land management. A two-step approach for crop identification in the central region of Ukraine was developed in [52] by exploiting intra-annual variation of temporal signatures of remotely sensed observations (Sentinel-1 and Landsat images) and prior knowledge of crop calendars. Crop maps are often created using vegetation indices and field observation data. The authors in [73] argued that this may lead to datasets and ML models that can only predict in specific areas and not generalize up to larger areas (i.e., regions or countries) or to other time periods in the same area. They further argued that what is needed is a more generalized method that can take in information like weather and climate data or DEM data and scale up to field-level predictions or larger.
Agricultural expansion can cause harmful effects to ecosystems and their levels of biodiversity. Producing crop-type maps using RS imagery and ML is one way to help monitor agricultural expansion over large areas, and these maps in turn can help policymakers and land-use managers make more informed decisions about current and future land use. However, creating the maps themselves normally requires a lot of data and it is not a straightforward task to pick an ML model that will perform well with that data. There is also the concern that the predictions from that ML model will be uninterpretable, given that many ML and DL models are so-called “black boxes”. To get around this issue, the authors in [79] trained a maximum likelihood model and a fuzzy-rules classifier to determine paddy rice distribution in Iran. Plants look very different in RS imagery depending on the type of imagery that you use, but also over the course of a plant’s lifetime. This is especially true of crops like rice, so it is important to incorporate phenological information in order to be able to monitor it over time. Over a three-year time period, the authors in [75] were able to map paddy rice using Sentinel imagery by utilizing several different spectral indices and creating composites of different paddy rice growth periods. Continued agricultural expansion threatens many ecosystems around the globe with high levels of biodiversity. Being able to monitor agricultural expansion is one part in being able to make timely decisions related to water and soil health in addition to pollution levels caused by fertilizer use. Mapping croplands over a large scale with NNs and high-resolution RS imagery has resulted in highly accurate maps, but NNs are computationally expensive to train. A U-Net was used in [71] to map sugarcane in Thailand but used a lightweight NN as an encoder for the DL model to reduce computing costs. Sugarcane grows in rainy conditions in complex landscapes, making mapping it difficult. However, using phenology information can help identify sugarcane in high-resolution RS imagery as shown in [56]. The performance of ANN to CART, RF, and SVM models on GEE was compared for sugarcane mapping in China using Sentinel-2 imagery. Shade-grown coffee landscapes are critical to biodiversity in the forested tropics, but mapping it is difficult because of mountainous terrain, cloud cover, and spectral similarities to more traditional forested landscapes. Landsat, precipitation, and DEM data were used in [50] to map shade-grown coffee in Nicaragua using an RF model. Accuracy scores across different land class types (including shade-grown coffee) were high; a relative variable importance was also analyzed on what data contributed most to the RF model’s performance. It is difficult to know beforehand the effect different datasets will have on producing LULC maps. It is therefore useful to compare the performance of a ML classifier on different datasets like Landsat and Sentinel imagery, so that future researchers know which datasets fit their application. The differences between Landsat and Sentinel imagery were explored in [78] for identifying cotton in China over the course of the plant’s life cycle.
Crop maps are increasingly being produced at the national and global levels, but this process requires a lot of compute resources. Cloud computing offers free access to data and computing, yet many studies producing crop maps and crop yield estimates do not take advantage of these resources. In the United States, crop yield estimates for soybeans start very late in the season, but early estimates are needed to inform management decisions like when to harvest. The authors in [55] used a CNN–LSTM hybrid model to predict soybean yield in the contiguous United States using RS imagery alongside weather data and showed that the hybrid approach works better than either CNN or LSTM alone, although the results were better in some states than others. Additionally, the authors created combinations of input data to determine which variables were most important in training their NN. Still, the authors had to move their DL training off the GEE platform because it did not currently support NN architectures. Many variables, including climate/weather, fertilizer, soil, economic, and hydrological data, can be incorporated into crop yield prediction simulation models. However, this amount of data, needed to make the crop models accurate, is often not available in specific countries or are too time-consuming and cost-intensive to collect and maintain. RS imagery can help fill this need by providing open data over long temporal scales and global coverage, regardless of country. The authors in [66] demonstrated that by using climate and soil data with RS imagery on the GEE platform, it was possible to predict winter wheat yields 1–2 months ahead of harvesting in China. Producing crop type maps is often a useful first step in predicting crop yield. However, crop type maps that are derived from lower-resolution RS data suffer from uncertainties in areas where soil, crops, and plants are heavily mixed. Current cropland products only focus on a subset of staple crops. Optical and SAR Sentinel data were combined in [72] to create higher-resolution maps capable of displaying information on less commonly mapped non-staple crops in the US.
It is challenging to map cropland extent over large countries or regions in a rapid, repeatable, and accurate manner. This is in part due to the large amount of RS imagery that are usually required to make these maps, in addition to needing to access validation datasets in comparable formats across geo-political boundaries. Even when this is possible, crop maps are created using coarse RS imagery, limiting the utility of the output crop maps. In [16], the authors feed RS imagery with elevation and government data in Australia and China into an RF model to produce crop extent maps at 30 m, 250 m, and 1 km resolutions. It is difficult to achieve continuous, cloud-free imagery in Australia and China over time, so this analysis depends on creating bi-monthly composites. The authors noted that this analysis could have benefitted from a larger dataset in addition to comparing more classification algorithms to help reduce uncertainties from the RF model. LAI and fraction of photosynthetically active radiation (FPAR) are two important features while trying to produce crop extent maps and crop yield estimates. However, most current products for producing crop extent maps and crop yield estimates are derived from low-resolution RS imagery. In order to produce these maps and estimates at a higher resolution, the authors in [76] utilized GEE, Sentinel-2 and field data to train an RF to first estimate LAI and FPAR at a much finer spatial scale.
Global crop maps often fail to capture small farms because the resolution of the RS imagery used to create the maps is too coarse. Additionally, agricultural areas change over time and so the underlying validation (which is hard to acquire in the first place) often changes. Thus, producing high-resolution maps that track agricultural areas that are able to track crop production over time in an accurate fashion has proved difficult. Landsat-8 and Sentinel-2 imagery were combined in [47] with elevation data to produce a crop map across continental Africa on the GEE platform. Crop maps that are produced to cover a large area are often created from coarse RS imagery. This poses problems with identifying small or fragmented farms, as well as farms that are mixed-use or have several crop types over the same small area. Several attempts have been made to map land-use classes over large areas, but these maps do not focus specifically on crops and so their utility to food production studies is limited. To address these issues, [54] used RS imagery from several different platforms (GeoEye, Landsat, NGA, Quickbird, WorldView) to produce a 30-m resolution crop map for Southeast and Northeast Asia. Using an RF model, the authors achieved high accuracy rates across several crop type classes and made the resulting data layer public. However, to create cloud-free scenes from optical imagery across countries, the authors had to rely on multi-year composites. The authors noted that in the future, a harmonized Landsat–Sentinel dataset would be useful to expand spatial and temporal data coverage.
Sustainable management of agricultural water resources requires improved understanding of irrigation patterns in space and time. Annual irrigation maps (1999–2016) in the US Northern High Plains were produced in [49] by combining all available Landsat satellite imagery with climate and soil covariables in an RF classification workflow. In [51], the authors implemented an automatic irrigation mapping procedure in GEE that uses surface reflectance satellite imagery from different sensors (Landsat 7/8, Sentinel-2, MODIS Terra and Aqua imagery, SRTM DEM). A rapid method was developed to map Landsat-scale (30 m) irrigated croplands in [58] across the conterminous United States (CONUS). The method was based upon an automatic generation of training samples for most areas based on the assumptions that irrigated crops appear greener than non-irrigated crops and had limited water stress.
Cropland classification is highly dependent on RS imagery resolution, the scale of a given analysis, the processing steps, and the input training data. Coarse resolution cropland data products have been found to contain large errors, but even higher resolution maps tend to have low accuracy rates and overestimate overall crop area. An open-source map was created in [19] for several West African countries using an RF model trained on Landsat data. The amount of RS data collected is increasing every day. This poses a problem for how best to analyze RS imagery and extract useful information from it, regardless of the EO domain. The authors in [77] implemented a dynamic feature importance tool that automatically finds the most important subset of input features for identifying crop types in China. They then fed these features to the SNIC algorithm and then to an RF on GEE and combined the output predictions with growth period information to produce crop-type maps that incorporate plant phenology. By incorporating growth stage information as an input feature to the ML model, the authors achieved a 6–7% boost in OA, precision, and recall across different crops like rice, maize, and soybeans. In their paper, the authors showed that red edge, NDVI, red, SWIR2, and aerosol information contributed the most to their analysis. However, the authors themselves stated that their method was unstable due to the nature of their feature importance algorithm. Depending on what data was chosen with their feature importance algorithm, the accuracy of the method fluctuated. Thus, their method was good for reducing data size and should be used when compute is limited, though using all of the data in a given time series was shown to work better.

3.2.2. Land Cover Classification

Land cover classification is the 2nd-most-developed domain area using GEE and AI (27 studies total). Table 2 below summarizes those studies and a word cloud generated from the titles and keywords of those papers is provided in Figure 7. The most frequently used words are “Google Earth Engine”, “land”, “cover”, and “classification”. “Landsat” RS imagery features heavily in LULC research, though the trend is moving towards higher-resolution data and creating maps over much larger areas. The words “Sentinel” and “Africa” illustrate this point well.
From our interactive web app (see Appendix A) and Table 2, Landsat 8 OLI, SRTM DEM, and Google Earth are mostly used. The most popular AI models are RF, CART, and SVM, and the mostly used evaluation metrics are overall accuracy (OA), PA, UA, and Kappa. A brief summary of those studies is provided right below Table 2. More detailed textual summaries for most of the reviewed land cover classification studies are provided in Appendix C.2.
LULC maps can help decision-makers and land managers make more informed decisions about the environment. Still, producing LULC maps with ML and RS data requires a lot of compute and labeled input training data. GEE currently offers free compute, so researchers can use the data that they are interested in without having to worry about hardware setup or compute time. The authors in [102] took advantage of this to create an LULC map in Northern Iran, predicting for water, rangelands, built-up areas, orchards, and other LULC classes. They used Landsat RS imagery, field observations, and historical datasets to train CART, RF, and SVM models. The SVM performed better than the CART and RF models, but perhaps more importantly the authors also ran a spatial uncertainty analysis to show each model’s confidence level on the output maps. More research should include uncertainty incorporated into reporting metrics or on maps produced with ML to better convey a model’s certainty to both citizens and decision-makers.
There are currently high data and computational costs of having to store RS data across different machines using different ML algorithms. There is an additional challenge in that most RS analyses depend on optical data, which is often obscured by clouds and shadows. In addition, most land cover maps have coarse resolution and do not often describe the same things as other maps (making them not directly comparable). These static maps need to be more accurate and updated frequently to be of real use, and cloud computing alongside data and algorithms being in one place has allowed both of these to become a reality. An RF model was used in [80] to determine land-use classes such as vegetation, croplands, and urban areas from Landsat imagery in Zambia. An approach was presented in [81] to quantify continental land cover and impervious surface changes over continental Africa for 2000–2015 using Landsat images and an RF classifier on GEE. Simple change detection based on Landsat images from two different years with two different phenophases yields unsatisfactory results and may induce many misclassifications and pseudo-change identifications because of the phenological differences between RS images. A land-use/land-cover type discrimination method based on a CART was proposed in [82], which applied change-vector analysis in posterior probability space (CVAPS) and the best histogram maximum entropy method for change detection, and further improved the accuracy of the land-updating results in combination with NDVI timing analysis. The last land-cover map of Iran was produced with MODIS imagery in 2016. Now, there are much higher resolution satellite data products, but it is difficult to collect more ground-truth validation data. Cloud computing and ML can help produce newer land cover classification maps that are easy to reuse. Such a workflow was designed in [93] on GEE for Iran using Sentinel-1 and -2 data and an RF model and SNIC. With the ground-truth training samples available, the authors used SNIC to segment land-use classes into objects while the RF model classifies them on the pixel level.
Numerous efforts have been made to end poverty around the globe. Mapping land-use changes in poverty areas can provide insights into the poverty reduction progress. Landsat images available on GEE were utilized in [83] to map annual land-use changes in China’s poverty-stricken areas. An open-source land cover mapping processing pipeline was created in [87] using GEE. The authors argued that land cover maps specifically can help countries properly plan for sustainable levels of food production, but that many developing countries did not have the financial or compute resources to monitor land classes in real time. Using SVM and bagged trees (BT) models, the authors predicted urban, agriculture, tree, vegetation, water, and barren land-use types in Lesotho.
In RS imagery, many different land-use types have similar spectral signatures or are very complex, making them difficult to be properly identified. Several different ML models available on GEE were trained in [92] with different combinations of input data to determine which were the most important in determining land-use types in Golden Gate Highland Park in China. Although RS and ML have allowed LULC analysis to become ever more accurate for general LULC classes, it is still challenging to correctly identify land subtypes. For example, while classifying vegetation to a high degree of accuracy has become more commonplace, identifying vegetation subtypes like shrubs or grassland is not as straightforward, especially in mixed-use areas. In addition, as is the case for many RS applications, it is challenging to know which types of input data will contribute to a given ML model’s ability to learn these subtypes. Therefore, the authors in [95] set out to compare the contribution of SAR data and different indices (NDVI, EVI, SAVI, NDWI) derived from optical data on overall classifier performance. A land cover map of the whole African continent at 10 m resolution was generated in [98], using multiple data sources including Sentinel-2, Landsat-8, Global Human Settlement Layer (GHSL), Night Time Light (NTL) Data, SRTM, and MODIS Land Surface Temperature (LST). Different combinations of data sources were tried to determine the best data input configurations. Pixel-based classification methods often suffer from “salt-and-pepper” noise in their end predictions. Object-based classifiers can help alleviate this problem but are not commonly used because of their high compute overhead. While GEE does not have many object-based classifiers, it does provide free compute. To take advantage of this while comparing the performance of pixel-based and object-based classification methods, [100] produced LULC maps in Italy using Landsat, Planet, and Sentinel RS imagery. The authors compared the performance of RF and SVM models alone with that of the same models used in conjunction with the SNIC and gray-level co-occurrence matrix (GLCM) texture data. Their results showed that pixel-based methods worked better at lower resolutions (i.e., using Landsat data), whereas object-based methods worked better for higher-resolution RS imagery. The best classifier was the RF model trained with SNIC and incorporating GLCM data. Still, the authors noted that ML models were heavily influenced by input data, feature engineering, the classes that you were trying to predict for, and the place being studied. Many studies evaluate ML methods and the effect that input data sources have on their performance. Not as much research is done into determining how data sampling strategies affect ML classifiers. The authors in [101] compared different data sampling strategies and their effects on how different ML classifiers performed on LULC tasks. A multi-seasonal sample set was collected in [88] for global land cover mapping in 2015 from Landsat 8 images. The concept of “stable classification” was used to approximately determine how much reduction in training sample and how much land cover change or image interpretation errors can be acceptable.
Mountain Land Cover (MLC) classification can be relatively challenging due to high spatial heterogeneity and the cloud contamination in optical satellite imagery over the mountainous areas. Distribution of Land Cover (LC) classes in these areas is mostly imbalanced. To date, three approaches have been proposed to address the class imbalance problem: (1) applying specific classification methods by focusing on the learning of minority classes, (2) assigning higher weights on minority classes by adjusting classifiers, and (3) rebalancing training datasets (e.g., oversampling and under-sampling techniques). A hybrid data-balancing method, called the Partial Random Over-Sampling and Random Under-Sampling (PROSRUS), was proposed in [96] to resolve the class imbalance issue. The class imbalance problem reduces classification accuracy for infrequent and rare LC classes. A new method was proposed in [97] by integrating random under-sampling of majority classes and an ensemble of Support Vector Machines, namely Random Under-sampling Ensemble of Support Vector Machines (RUESVMs). Rapid urban expansion puts pressure on local ecosystems and human well-being, so urban sustainability studies are increasingly turning to applications that process large amounts of geospatial data and model ecosystem services. Currently, it is not straightforward for urban or ecology scientists to use cloud-based platforms like GEE as their processing routines are more complicated than the many common mapping applications (i.e., classification) available on GEE. While determining ecosystem service values is complicated (many disciplines, many opinions, etc.), GEE was used in [94] to illustrate a processing workflow for how LULC classes can be used to compute more complex ecosystem service values.
Watersheds around the world are under stress, both due to climate change and human disturbance. LULC maps can help with planning and conservation decisions, but they are often difficult to make because they are compute-intensive to make. GEE has helped many researchers by providing freely available data, methods, and compute, but researchers often find that they run into compute limits on the platform before they can complete their analyses. To overcome these compute limits in GEE, the authors in [103] used feature reduction techniques and designed their own parallel processing algorithms to produce an LULC map across several Middle Eastern countries. To get a better idea of how water resources were being affected by LULC classes, the authors combined topographic data, spectral data, RS image composites, and texture information to train a combined SNIC-RF model. They achieved high accuracy across several LULC classes and showed feature importances for each class in their analysis. However, the authors noted that other than SNIC, advanced object-based classification and segmentation algorithms were not available on GEE.

3.2.3. Forest and Deforestation Monitoring

Forest and deforestation monitoring is the 3rd-most-developed application using GEE and AI (20 studies total). Table 3 below summarizes those studies and a word cloud generated from the titles and keywords of those studies is provided in Figure 8. The most frequently used words are “Google Earth Engine” and “forest change”. A significant amount of research is done in monitoring deforestation over time and differentiating forest cover from plantations of oil palm, for example. Thus, the words “deforestation”, “plantation”, “palm oil”, and “time series” feature prominently in the word cloud. Additionally, Landsat, Sentinel-1, and Sentinel-2 data are frequently used in forest and deforestation monitoring research in tropical forests in places like the Amazon and Myanmar. From our interactive web app (see Appendix A) and Table 3, the most-used RS datasets are Landsat 8 OLI, SRTM DEM, and Google Earth. The most popular AI models are RF, SVM, and CART, and the most-used evaluation metrics are OA, PA, UA, Kappa. A brief summary of those studies is provided right below Table 3. More detailed textual summaries for most of the reviewed forest and deforestation monitoring studies are provided in Appendix C.3.
Forests provide many ecosystem services, from preventing soil erosion, regulating the hydrological cycle, and providing shelter for many plant and animal species. However, deforestation is occurring at a rate that is making it impossible for individual species to recover. As deforestation accelerates, there are cascading effects for entire ecosystems. In Brazil, agriculture, ranching, and land occupation is causing the vast forest of the Amazon to become fragmented. Still, it is difficult to monitor the changes through time due to cloud cover and the rate that new satellite imagery comes in every day. The authors in [117] showed how GEE can be used to overcome data storage and compute needs and analyze about 20 year’s worth of Landsat data to determine forest cover changes. Land use maps can help inform policymakers and land-use managers but are often static and of coarse resolution. It would be more useful to create these maps in a repeatable manner, one in which code and data could be reused for making decisions based on up-to-date information. Sentinel-2 data were analyzed in [120] and several different ML classifiers were trained to distinguish between four different forest types in Italy during both summer and winter seasons. Monitoring tree species distribution is an important metric in monitoring overall forest health and in determining current carbon storage efforts. However, doing so is difficult without the use of high-resolution RS data, much of which is either private and inaccessible or too expensive to collect (in the case of LiDAR or UAS data). Recent research in environmental mapping applications uses DL and NN to identify tree species across large areas with minimal feature engineering, but NNs currently need large amounts of compute and labeled input data to train on. To classify tree species across a large area in China while fitting within compute restraints, an RF was trained on the GEE platform in [121] using optical and SAR imagery, DEM data, and field observations.
A participatory forest mapping methodology was developed and tested in [109] to map the extent and species composition of forest plantations in the Southern Highlands area of Tanzania. Collecting field observations of plant phenology can be time- and labor-intensive to repeatedly obtain. RS imagery can help continuously monitor phenology information because of its high spatial and temporal resolution. To create a forest type map in India using RS imagery and ML, the authors in [123] predicted for evergreen and deciduous forest types, as well as “non-forest” classes. Collecting, storing, and processing large amounts of RS imagery presents a barrier to doing research in the environmental and earth sciences. GEE provides data storage, compute, data processing, and ML algorithms on its platform. The researchers in [118] used GEE to map mangrove extent in Indonesia.
Global forests face many anthropogenic threats, one of the most prominent being the conversion to agriculture. This is often done through deforestation by fire or clearcutting, but a less studied mechanism related to forest health is the slow degradation caused by continual disturbances. This is more difficult to track using EO methods like vegetative indices derived from RS data, because forest degradation does not always result in canopy loss. The tropical forests of Ghana, many of which are in protected areas, still suffer from logging, mining activities, fires, and expanding agriculture production. All of these disturbances contribute to declines in forest health; a method was developed in [125] for monitoring tropical forest loss and recovery based on Landsat data. Forest logging and forest fires are both dominant drivers of forest loss worldwide. However, deforestation monitoring efforts are often limited by low-resolution RS data and the inability to create forest maps on a continual basis. Using SAR data as input and high-resolution optical data as validation data, a U-Net was trained on Google Cloud in [124] to create monthly forest loss maps.
Forest degradation and deforestation have been occurring around the globe during the past decades, which threaten the biodiversity of ecosystems. As a highly forested landscape, southern Belize has been experiencing deforestation due to agricultural expansion. Landsat 8 imagery on the GEE was utilized in [108] to perform a supervised classification. An MLP model was then built to predict future deforestation patterns and magnitude based on the drivers of past deforestation patterns in the region. Deforestation is accelerating in the tropics in part due to industrial oil palm plantation expansion. Being able to monitor illegal deforestation can aid in conserving the remaining forested landscape and is thus critical to maintaining biodiversity and ecosystem services. A low-cost method was demonstrated in [107] for monitoring industrial oil palm plantations in Indonesia using Landsat 8 imagery that allowed them to distinguish between oil palm (immature oil palm, mature oil palm), forest, clouds, and water classes using the CART, RF, and MD algorithms. Oil palm can play a key role in both ecosystems and local economies but is also a common cause of deforestation. Thus, monitoring and managing the oil palm industry is often necessary to ensure that deforestation does not occur, but traditional ground surveys are time-, effort-, and cost-intensive. RS imagery can help monitor large areas over time. In [115], several ML models were compared to map oil palm using 30 m Landsat 8 imagery in Malaysia. Tropical forests in different Sub-Saharan African countries face high rates of deforestation to illegal logging and cropland expansion. Being able to monitor tropical deforestation is important to monitor ecosystem balances and health. However, separate efforts to produce deforestation maps or products use different data sources, describe slightly different things, or lead to diverging land cover estimates, making them difficult or impossible to directly compare. To explore how GEE could be used to create an open-source processing pipeline for deforestation mapping in Liberia and Gabon, two different RF models were used in [116] to create data masks and then predictions for various land types there. The Amazon Rainforest is home to much of the world’s biodiversity and plays an important role in natural carbon sequestration. However, this region is experiencing high rates of deforestation due to the expansion of agriculture and cattle farming. It remains challenging, though, to monitor such a large area given its size and biological complexity and use that information to produce forest change projections into the future. An RF was used in [122] for initial LULC classification, then used an MLP to simulate possible deforestation scenarios into the future.
Mapping how much carbon forests sequester remains difficult because current techniques rely on mapping forested versus deforested landscapes. However, a major source of uncertainty stems from the fact that degraded forests, ones open to selective logging, are not a separate class but can emit carbon heavily even though they are counted as forested regions. This issue was addressed in [15] by mapping disturbed forest areas in Brazil using 27 years of Landsat surface reflectance imagery.

3.2.4. Vegetation Mapping

Vegetation mapping is a well-developed application domain using GEE and AI (18 studies total). Table 4 below summarizes those studies and a word cloud generated from the titles and keywords of the 18 papers is provided in Figure 9. The most frequently used words are “Google Earth Engine”, “Landsat”, and “vegetation mapping”. Plant dynamics (i.e., phenology and land-use changes over time) play an important role in differentiating vegetation from forest cover. From our interactive web app (see Appendix A) and Table 4, the most-used RS datasets are Landsat 8 OLI, Landsat 5 Thematic Mapper (TM), and Landsat 7 Enhanced Thematic Mapper Plus (ETM+). The most popular AI models are RF, CART, and SVM, and the mostly used evaluation metrics are PA, UA, and OA. A brief summary of those studies is provided right below Table 4. More detailed textual summaries for most of the reviewed vegetation mapping studies are provided in Appendix C.4.
Accurate near real-time estimates of vegetation cover and biomass are critical to adaptive rangeland management. An approach was developed and tested in [129] to automate the mapping and quantification of vegetation cover and biomass using Landsat 7 and Landsat 8 imagery across the grazing season (i.e., changing phenological conditions). Annual percent land cover maps of plant functional types across rangeland ecosystems were produced to effectively and efficiently respond to pressing challenges facing conservation of biodiversity and ecosystem services. The authors in [130] utilized the historical Landsat satellite record, gridded meteorology, abiotic land surface data, and over 30,000 field plots within an RF model to predict per-pixel percent cover of annual forbs and grasses, perennial forbs and grasses, shrubs, and bare ground over the western United States from 1984 to 2017, at approximately 30 m resolution. Rangelands in the western United States are home to many different animal and plant species. They are ecologically diverse and have been traditionally monitored by taking and analyzing in situ measurements in different areas. However, continually collecting field observations can be time- and labor-intensive and land managers are often asked to make decisions about large areas with sparse field information. RS data can help monitor rangelands with a large spatial scope and a short return time, making them key to informing land management decisions in a timely manner. Using climate and field data alongside Landsat imagery and MODIS land-use maps, ML models used in [21] were able to predict for several important rangeland indicators like plant height, total vegetation and rock cover, as well as bare soil.
Invasive species can degrade ecosystems and harm biodiversity as well as soil and water quality. It is often difficult to monitor invasive species in coastal environments from optical RS imagery, though, because of frequent cloud cover. A specific invasive species in China was used in [136] as a case study for developing an ML pipeline that takes into account both cloud cover and phenological information. Invasive species can have harmful environmental effects as they disrupt ecosystem balances. Long-term datasets, like those of the grass S. alterniflora, are not always available, making them difficult to detect using RS methods. In order to produce a map of this invasive species, field data were collected and processed in [139] in addition to UAS imagery and optical RS data from several different platforms.
It is often difficult to detect changes in savanna landscapes due to their high heterogeneity in vegetation types, which makes it even harder to attribute change to natural or anthropogenic causes. This is especially problematic in areas like the Brazilian Cerrado where agricultural expansion is happening on a large scale. In order to clarify what changes have been happening there, over three decades worth of Landsat imagery was used in [135] to determine which areas have experienced vegetation change. Wetlands provide many ecosystem services and provide important habitats for several different plant and animal species. In order to make informed conservation and policy decisions, it is important not only to be able to map the current state of wetlands vegetation, but how that vegetation is changing over time. However, different sets of input data and ML methods used for change detection of wetland vegetation need to be evaluated more fully as choices made during preprocessing and hyperparameter tuning can affect the end result of an analysis. The authors in [138] used an adaptive stacking algorithm to train an ML classifier on optical, SAR, and DEM data to identify wetland vegetation.
Seagrasses provide many ecosystem services, from carbon storage, providing habitat for many marine species, and preventing coastal erosion. However, they are in decline due to anthropogenic impacts. Mapping their extent is key to being able to conserve them. Bathymetry and RS data were combined in [127] to create a processing and analysis pipeline for large-scale seagrass habitat monitoring in Greece using GEE. Grasslands are often integrated into land-use type or cropland-specific maps, even high-resolution products. However, different grassland species are not identified and thus are classified as a single homogenous land or crop type. This is a problem not just because previous maps have not separated out different grassland types, but it is difficult to recognize them in RS imagery because they look very similar. Some experts are able to recognize such classes, but it is time-consuming to analyze grassland types at scale. Thus, DL techniques that do not rely on expert knowledge are needed so that these identification systems can work over large areas over time. A CNN–LSTM hybrid model was used in [132] to identify grassland types in Sentinel-2 imagery in the United States.
Feature engineering is important in ML, but it is labor-intensive and often requires domain expertise [1]. As one ML branch, DL does not need feature engineering, as deep NN will figure it out from large, annotated data examples, but DL requires much more large training data than ML [1]. The authors in [43] addressed this issue by comparing the performance of an RF model with feature engineering to an LSTM and U-Net NN models without feature engineering for identifying pasturelands in Brazil. Monitoring vegetation on a large spatial scale can be difficult because field data collection takes only snapshots in time and is labor-intensive and expensive. Instead, methods for measuring vegetation need to be done over time so that change detection is possible. Still, novel methods, such as those utilizing RS imagery, need to meet current governmental quality standards. An example of how this can be done is illustrated in [126] in Australia using the GEE platform by comparing how well several ML classifiers compare to index-based methods like NDVI. Although coastal wetland systems are critical habitats for different animal and plant species, it is difficult to monitor them due to cloud cover and difficulty in obtaining RS imagery at high and low tides. Previous studies have used single images or spectral time series to try and identify wetland vegetation, but coastal wetland environments are complex ecosystems. The same species of plant can look different at different stages of its life while also being submerged under water in some RS scenes. The authors in [140] argued that phenology information in RS time series can better capture tidal flat wetland vegetation and so compared phenology information to statistical (min, max, median) and temporal features (quartile ranges). Mapping plant functional types is important because it can give ecosystem modelers and environmental planners a better idea of the spatial distribution of vegetation. This in turn has implications on how resilient areas and ecosystems are and will be to changing climatic factors like heat stress. However, plant function type classification relies on and is often derived directly from current LULC map products that themselves can contain inaccuracies. To explore how plant functional types can be derived directly from RS information, [137] trained an RF model on field, DEM, MODIS, and climate data.
Many methods have been developed to estimate different vegetative properties from RS imagery in response to environmental changes. One such method, Gaussian Process Regression (GPR), is increasingly used to do so because it is a transparent ML model that also outputs model uncertainties. However, as environmental and earth scientists move to GEE for finding and processing data, they may find a lack of GPR models ready to use or train. This is most likely because GPR models become slow and memory-intensive when trained on large RS time series imagery. Such a model was implemented in [141] that has been optimized for green LAI in RS imagery but does so in a way that is also optimized for GEE.

3.2.5. Water Mapping and Water Quality Monitoring

Water mapping and water quality monitoring is another well-developed application domain using GEE and AI (18 studies total). Table 5 below summarizes those studies and a word cloud generated from the titles and keywords of the 18 papers is provided in Figure 10. The most frequently used keywords are “Google Earth Engine”, “surface water”, and “change”. Change detection is very important in water mapping research, though it requires the ability to first map the water. Sentinel-1 data are used almost on par with Landsat data in the water mapping literature, thus the size of these two terms is almost the same. From our interactive web app (see Appendix A) and Table 5, the most-used RS datasets are Landsat 5 TM, Landsat 7 ETM+, and Landsat 8 OLI. The most popular models are RF, multiple linear regression, Modified Normalized Difference Water Index (MNDWI), and SVM, and the most-used evaluation metrics are R2, Kappa, and OA. A brief summary of those studies is provided right below Table 5. More detailed textual summaries for most of the reviewed water mapping studies are provided in Appendix C.5.
Static surface water maps are often produced at the regional or national level, but do not show long-term trends resulting from seasonality or global warming’s effects. In [32], the authors created a web portal using GEE as a backend alongside an expert system to identify bodies of water in Landsat imagery. RS has been widely used to map and monitor surface water. In [142], the authors used all available Landsat images to study surface water dynamics in Oklahoma from 1984 to 2015. The authors in [142] found significant inter-annual variations in the number of surface water bodies and surface water areas. They also found that both the number of surface water bodies and surface water areas had a positive relationship with precipitation and a negative relationship with temperature.
Floods and heavy precipitation events often occur at times of heavy cloud cover, making optical imagery not well-suited to water mapping or flood monitoring during those times. Traditionally, ground-based gauges are used to monitor water level and stream flow, but only work at specific points, limiting their utility during large-scale flood events. SAR imagery, however, is often used in water mapping or flood monitoring analyses because of its ability to see through clouds and work over large spatial scales. This is especially important for monsoonal regions like Southeast Asia where intense rains can lead to flood conditions. However, SAR imagery is also susceptible to classification errors when flooding occurs under tree cover or looks like concrete/pavement in urban areas, so preprocessing steps should be carefully considered. The authors in [150] analyzed to what degree different preprocessing steps affect the output water maps using both SAR and DEM data and two variations of Otsu’s thresholding algorithm. Glacial lake outburst floods (GLOF) are one of the serious natural hazards in the Himalayan region. To reduce the potential risks of GLOF, the information about the location and spatial distribution of glacial lakes is critical. In [143], the authors used Landsat 8 images available on GEE to map glacial lakes in the Tibet Plateau region. Their results revealed that climate warming played a major role in glacial lake changes.
Categorizing urban water resources faces two main challenges. First, it is often difficult to distinguish between water and things like asphalt or shadows in urban settings using RS imagery. Second, the distribution of water resources has changed alongside the accelerating impacts of climate change, making up-to-date, temporally aware water monitoring difficult. GEE provides free data storage, datasets, and compute, but as of yet high-accuracy DL models like NNs are not available on the platform. In [151], the authors compared the performance of MNDWI and an RF to that of a multi-scale CNN (MSCNN) and showed that the DL method was the most accurate (with less false classifications) for identifying urban water resources in several Chinese cities. While DL receives a lot of attention in water mapping research, these models still require a lot of input data and large amounts of compute to train them. However, as compute becomes publicly available in cloud-based platforms like GEE, obtaining large amounts of labeled training data remains a key bottleneck to using DL models. One way to make the data labeling process less time- and resource-intensive was illustrated in [156], where the authors used current water maps and a segmentation algorithm to automatically collect data labels from Sentinel-1 imagery.
Optical imagery used in surface water mapping analyses is often occluded by clouds, and many common methods used to map surface water confuse snow, ice, rock, and shadows as water. DeepWaterMapv2 was released in [147] and aimed to address these false positive misclassifications.
ML models have achieved high levels of accuracy in identifying water bodies in RS imagery. However, the models often misclassify soil, rock, clouds, ice, and shadow as water and often rely on cloud-free, optical RS imagery, which is not always available. The authors in [157] used masking, filtering, and segmentation algorithms to identify bodies of water in Sri Lanka in complex, mountainous environments. It is challenging to repeatedly produce up-to-date, accurate surface water maps over large areas. Water bodies change their shape and overall distribution through time, and humans use water in ways that look dissimilar to natural water bodies in RS imagery. Most studies to date focus on one type of water body (lakes, rivers, etc.) or create a binary classification mask giving little to no detail on various water body classification types. To explore the potential to distinguish between surface water body subtypes, [158] used slope, shape, phenology, and flooding information as input to an RF model to predict for lakes, reservoirs, rivers, wetlands, rice fields, and agricultural ponds.
The authors in [144] proposed a new method for quickly mapping yearly minimal and maximal surface water extents. In [148], the authors integrated global surface water (GSW) dataset and SRTM-DEM to determine the spatiotemporal patterns of water storage changes in China’s lakes and reservoirs. Multitemporal, multispectral satellite observations from the Landsat program and Sentinel constellation are particularly useful in fluvial geomorphology, in which river channel mapping and the analysis of planimetric change have long been a focus. The authors in [154] demonstrated a workflow showing how GEE can be used to extract active river channel masks from a section of the Cagayan River (Luzon, Philippines).
Satellite RS can be used to estimate chromophoric dissolved organic matter (CDOM) as a riverine constituent that influences optical properties in surface waters. CDOM absorption is a common proxy for dissolved organic carbon (DOC) concentrations in inland waters, including Arctic rivers. The authors in [146] stated that this was the first study using GEE for RS of water quality parameters in inland waters. Collecting field data for monitoring water quality can be costly in terms of money, time, and effort. Additionally, traditional monitoring techniques do not extend over a large area and are often difficult to repeat over time. Satellite RS imagery can help monitor water quality at frequent intervals over large areas. To estimate water quality parameters like chlorophyll-a (Chl-a) concentrations, turbidity, and dissolved organic matter, [152] used ML and DL models to analyze RS imagery. Harmful algal blooms (HABs) have become a serious issue in freshwater ecosystems. RS has proven to be a cost-effective means for monitoring HABs. The authors in [153] developed a methodological framework for mapping Chl-a concentrations with multi-sensor satellite observations and in-situ water quality samples.

3.2.6. Wetland Mapping

Wetland mapping is one of most well-developed applications using GEE and AI (16 studies total). Table 6 below summarizes those studies and a word cloud generated from the titles and keywords of the 20 papers is provided in Figure 11. The most frequently used words are “Google Earth Engine”. Wetlands have many different subtypes and occur in both inland and coastal environments, so we can see “wetlands” alongside terms like “tidal flats” and “coastal”. Most of the wetland mapping studies we reviewed take place in Canada and combine high-resolution RS imagery like Sentinel-1 and Sentinel-2 imagery to better distinguish between water and aquatic vegetation. From our interactive web app (see Appendix A) and Table 6, the most-used RS datasets in those studies are Sentinel-1, Sentinel-2, and Google Earth. The most popular models used are RF, boosted regression trees (BRT), CART, and Simple Non-Iterative Clustering (SNIC), and the mostly used evaluation metrics are OA, Kappa, PA, and UA. A brief summary of those studies is provided right below Table 6. More detailed textual summaries for almost all reviewed wetland mapping studies are provided in Appendix C.6.
Wetland serves as the globally biggest carbon pool, and thus has important ecological service functions (e.g., water conservation, regulation, and maintenance of species diversity) [173,174,175]. Global climate change and human activities have posed dramatic challenges in the past few decades to wetland ecosystems, and wetland mapping is essential to conserve and manage terrestrial ecosystems [176]. RS makes investigating large wetland systems and monitoring their change over time possible [177].
Wetlands are highly dynamic landscapes, often making past efforts to map them out-of-date. This is especially true at the regional or national level, where it is often difficult to monitor wetlands at scale due to their remote location and large spatial scale. While there are efforts to monitor wetlands in Canada at the sub-regional and -province level, this is mostly through governmental efforts to produce static maps. Cloud computing on GEE was utilized in [35] to create an open-source, reproducible map of wetland occurrence probability using LiDAR and RS data for the entire area of Alberta. Mapping subtypes of wetlands is difficult because while they look similar in RS imagery, they are diverse environments that cover a wide area. The same is true for classifying peatlands, a subtype of wetlands, which cover large geographic areas in complex patterns. This is problematic because peatlands, like wetlands, provide critical habitats that promote biodiversity while also being a global carbon sink. Past studies have shown that while optical data are useful for peatland mapping, it is often occluded by clouds or other atmospheric conditions. SAR data, on the other hand, can detect bodies of water and vegetation at any time of day or night, but are prone to being noisy due to surface moisture content and roughness. The authors in [162] demonstrated that by combining SAR, optical, and LiDAR data on the GEE platform, a BRT model was able to predict peatland occurrence across Alberta province with relatively high accuracy at high resolution.
Due to the difficulties in producing wetland inventory maps, either from lack of field data or the challenge of recognizing wetlands because of their heterogeneous and fragmented nature, these maps are often only produced at a local level. Furthermore, because of the many local efforts to produce these maps, wetland inventories are often produced with different datasets and different methods, limiting the ability of interested parties/stakeholders to compare or combine maps. Anthropogenic activities are meanwhile converting these wetlands into agricultural or urban landscapes, in addition to natural rain and flooding events changing their spatial makeup. Thus, it is more important than ever to be able to produce wetland inventory class maps in order to monitor and protect existing wetlands. The authors in [161] used optical and SAR RS imagery to produce a 10 m resolution wetland map for the entire province of Newfoundland, Canada, using both an RF model and SNIC. Mapping environmental features like wetlands is the first step in being able to make informed decisions about conservation and restoration projects. However, more relevant to policymakers is how environments change over time. This information would allow them to isolate how human activity has changed wetlands during different periods. The authors in [170] classified wetlands in Newfoundland during three different periods to show the spatial dynamics of these ecosystems. There have been several attempts to produce wetland inventory maps in Canada on a large scale, although they often lack high spatial resolution and the ability to distinguish between wetland sub-types. There is also the issue of a lack of ground-truth field data, a common problem in ML applications in EO (there is overwhelmingly more unlabeled data than labeled data). Using field data collected from one Canadian province was proposed in [17] to create wetland inventory maps for several others using a mix of optical, SAR, and digital elevation data.
Across Canada, wetland mapping is a well-studied phenomenon. However, different local and regional agency wetland inventories use different techniques for monitoring wetlands or have altogether different definitions of what constitutes a wetland. Thus, even though several large-scale wetland maps have been produced, they are often not directly comparable. Additionally, these maps are often static and do not continually monitor wetlands through time. However, these are not the only barriers to mapping wetlands using RS imagery [165]. Others include obtaining sufficient and recent field data to verify wetland monitoring products, but also the difficulty of monitoring such dynamic landscapes. Wetlands do not have clear-cut boundaries, are extremely diverse landscapes and ecosystems, and are often in flux throughout seasons and years due to flooding and drying. The authors use optical and SAR Sentinel data in addition to field samples over the entirety of Canada and show that almost one-fifth of Canada is covered in wetlands. The study in [165] produced a high-resolution (10-m) wetland inventory map of Canada (an approximate area of one billion hectares), using multi-year, multi-source (Sentinel-1 and Sentinel-2) RS data on the GEE platform. Wetlands provide a variety of ecological services and are a key habitat for many species. Human activity has significantly disturbed wetlands as they are drained for urban or agricultural development. However, monitoring their health is challenging because it would require taking repeated field measurements over wide areas. Researchers have used ML and RS data to do so, but the large amount of compute needed to map wetlands is often prohibitive. The authors in [160] analyzed a large number of field samples alongside Landsat imagery with an RF model to produce a wetland map for all of Canada. Wetland mapping and monitoring have been a challenging issue for the RS community during the past decades. Compared with the United States with the National Wetlands Inventory, Canada has been lacking a national wetland inventory until recently. The authors in [168] proposed an object-based classification method to classify Sentinel-1 and Sentinel-2 data on the GEE cloud-computer platform, which resulted in the 10-m Canadian Wetland Inventory.
Large, inundated wetlands can be effectively mapped using RS imagery. Small wetlands or wetlands that are inundated only part of the time are much more difficult to identify. Yet, it is more important to do so now than ever given that wetlands are rapidly being converted for agricultural use or are drying up due to climate-induced drying. Monitoring wetlands at large scales is possible, however, with the help of automated techniques like ML. For example, NAIP imagery and LiDAR derived DEM data were used in [163] to detect wetlands across the northern United States using unsupervised classification on the GEE platform. Being able to identify wetlands in RS imagery is the first step towards monitoring their health or decline in a new climate regime, and to make policy choices based on this information. To this end, spatially high-resolution sensors like LiDAR or data products like NAIP can help researchers identify wetlands in RS imagery but are not collected often enough to map wetlands at a fine temporal resolution. This is problematic because wetlands are dynamic ecosystems; they can be both wet and dry over the course of the same season. To get around this limitation, Sentinel-1 and 2 imagery were combined in [171] with aerial photographs and field data to map the spatial variation of wetlands in portions of the United States over time. Environmental problems are often associated with land-use changes, but these changes are not solely linked to urban expansion. Land use change also negatively affects areas like coastal wetlands, which are not monitored as regularly. The possibility of using GEE to map coastal wetlands in Indonesia was explored in [159] by comparing all of the different classifiers on the platform and how they perform with Landsat, digital elevation, and Haralick texture data. The authors showed that in all cases, ML models did much better at binary than multi-class classification.
Tidal flats, often referred to as coastal non-vegetated areas, are dynamic ecosystems, both due to their natural rhythms of water advance and retreat, but also due to anthropogenic change and rising sea levels. It is difficult to monitor tidal flats without the use of multi-temporal, high-resolution RS imagery because of how they change through time. With Landsat 8 and high-resolution Google Earth imagery, an RF model was used in [164] on GEE to classify tidal flat types and their distribution in China. The authors reported very high classification rates across tidal flat classes. However, the authors detailed that satellites like Landsat did not fully capture tidal ranges. Coastal wetlands are usually composed of coastal vegetation areas and tidal flats. Coastal tidal flats are natural transitions from terrestrial ecosystems to ocean ecosystems and are vulnerable to anthropogenic activities and natural disturbances such as sea-level rise, land reclamation, and aquaculture. Many existing global land cover data products have a wetland layer, but do not explicitly differentiate coastal vegetation area and coastal tidal flats (no specific layer for coastal tidal flats). The authors in [169] developed a pixel- and frequency-based approach to generate annual maps of tidal flats at 30-m spatial resolution in China’s coastal zone using the Landsat TM/ETM+/OLI images and the GEE cloud computing platform. Tidal flats are unique ecosystems but are threatened due to human disturbances and climate change. Additionally, they are difficult to identify in RS imagery because satellite platforms cannot capture intertidal variability due to their infrequent return times. The authors in [172] addressed this limitation by first processing high-resolution RS and UAS imagery to map minimum and maximum water and vegetation extent. They used Otsu’s thresholding algorithm to automatically detect the best ratio for each index. These two indices were then combined in a composite that showed the total intertidal area in the RS imagery, to which the authors again applied the Otsu thresholding algorithm. The end result was a highly accurate map of tidal flats that did not require any post-processing.
Sebkhas are a type of salty, unvegetated wetland created when desert bodies of water become more salinated over time due to mechanisms of water loss such as evaporation. They are home to specific species of vegetation and fish that can survive in salinated environments, but their drainage networks are often underground, making them hard to identify in RS imagery. An RF model was used in [166] to identify water cavities where sebkhas form in Morocco. Wetland inventory maps are increasingly being used to inform carbon pricing, ecosystem service values, and conservation/restoration decisions. Thus, it is important to make a repeatable processing pipeline that can ingest, process, and visualize data on a day-to-day basis so that monitoring programs and reporting programs (like in a government setting) have up-to-date, accurate information. To this end, there have been many studies identifying wetlands using RS imagery and ML, yet most of them suffer from not being able to distinguish between wetland subtypes. This is a challenging issue because fens, peatlands, bogs, marshes, and swamps can have very different vegetation types and structure. It is important to be able to distinguish between them because they each respond differently to human disturbance and changes in climate. The authors in [167] compared the performance of an XGBoost model to a CNN for wetland type classification.

3.2.7. Infrastructure and Building Detection, Urbanization Monitoring

Infrastructure, building detection, and urbanization monitoring is the 7th-most-well-developed application using GEE and AI (11 studies total). Table 7 below summarizes those studies and a word cloud generated from the titles and keywords of those papers is provided in Figure 12. The most frequently used terms are “Google Earth Engine”, “urban”, “land”, “building”, “impervious”, etc. The vast majority of the studies in this domain take place in China and are both static mapping and change-detection applications. Infrastructure and urban area identification is often done by comparing these classes to other LULC classes, so we notice that “vegetation” and “forest” also appear in the word cloud. From our interactive web app (see Appendix A) and Table 7, most frequently used RS datasets are Landsat 8 OLI, Landsat 7 ETM+, and Google Earth. The most popular AI models are RF, CART, and SVM, and the most frequently used evaluation metrics are OA, Kappa, PA, and UA. A brief summary of those studies is provided below Table 7. More detailed textual summaries for some selected studies are provided in Appendix C.7.
Materials like parking lots, roads, and buildings (i.e., concrete, asphalt) can be classified as “impervious surfaces” in RS analyses and are often indicative of human development and urban extent. Impervious surfaces change the hydrological cycle and produce heat effects, affecting overall ecosystem health and well-being. To monitor these materials, researchers have tried using night-time lights to estimate their extent, but this process leads to overestimates as light scatters. To investigate how best to identify impervious materials in RS imagery regardless of cloud cover, the authors in [182] combined nighttime light, DEM, and SAR data and an RF model on GEE. Their resulting maps were more accurate than commonly used maps like GlobeLand30. The authors in [180] put forward a new scheme to conduct long-term monitoring of impervious−relevant land disturbances using Landsat archives.
While greenhouses are used to grow food and help ensure food security, their proliferation can have environmental consequences. Previous attempts to classify greenhouses from RS imagery as part of LULC research have focused on small-scale proof-of-concept applications and have not emphasized identifying the structures in complex terrain types. To explore the possibility of identifying greenhouses in RS imagery over a large area in China, an ensemble ML model was designed in [185] to distinguish them from water, forest, farmland, and construction sites. Urban green spaces have a multitude of benefits, such as regulating urban climate, improving air quality, and reducing stormwater. RS has proven useful for studying the landscape structure of urban green spaces. The authors in [179] assessed the impact of urban form on the landscape structure of urban green spaces in 262 cities in China. The results revealed that cities with a high road density tended to have a smaller area of urban green spaces and be more fragmented. In contrast, cities with complex terrains tended to have more fragmented urban green spaces.
Rapid urban expansion around the world has led to worsening human and ecosystem health, affecting forests, air and water pollution levels, and overall levels of biodiversity. However, the currently available maps for mapping urban settlements and their expansion are mostly static, whereas it would be more useful to have up-to-date information to be able to make better urban planning and land-use decisions. The authors in [186] designed a workflow for mapping urban sprawl over time in Brazil using an RF on the GEE platform. Increasing rates of urbanization put pressure on conservation targets and biodiversity levels as land previously occupied by ecosystems is converted into built-up areas. RS imagery makes it much easier for urban planners and researchers to monitor rates of urbanization and urban sprawl over wide areas. However, few labeled datasets are available using ML to identify buildings and built-up areas. To address this problem, a large, vectorized, ground-truth verified dataset was created in [178] in India in order to train different ML models on GEE. A semi-automatic large-scale and long-time-series (LSLTS) urban land mapping framework was demonstrated in [183] by integrating the crowdsourced OpenStreetMap (OSM) data with free Landsat images to generate annual urban land maps in the middle Yangtze River basin (MYRB) from 1987 to 2017.
Research on urbanization and urban sprawl will often focus on how urban spaces are replacing agricultural land and forested spaces. Vegetation maps, on the other hand, are often produced using “urban”, “built-up areas”, or “impervious surfaces” as classes to predict for, distinctly separating vegetation and zones of human inhabitation. Much less work has gone into monitoring vegetation prevalence and distribution within urban spaces themselves. This is an important and timely research topic given the environmental and psychological benefits people get from having access to green spaces within cities, such as stress reduction, better air quality, and lower temperatures. Using different vegetative indices (EVI, Gross Primary Production, etc.) derived from Landsat and MODIS data, the authors in [181] showed that urban sprawl in Shanghai had increased significantly in the last decade and a half.

3.2.8. Wildfires and Burned Area

Wildfires and burned areas have less than 10 studies using GEE and AI (eight studies total). Table 8 below summarizes those studies and a word cloud generated from the titles, keywords, and abstracts of the eight papers is provided in Figure 13. The most frequently used words are “Google Earth Engine”, “burn/ed”, “fire(s)”, “forest”, and “change”, which show that the main focus of these studies is on monitoring forested areas pre- and post-fire. The data products “Sentinel-2”, “Landsat”, and MODIS maps like “MCD64A1” (a burned-area product) indicate that wildfire and burned area mapping analyses rely mainly on optical imagery. From our interactive web app (see Appendix A) and Table 3, the mostly used RS datasets are Landsat 8 OLI, and MODIS. The most popular AI models are RF, SVM, and CART, and the most-used evaluation metrics are OA, commission error (CE), Kappa, omission error (OE), and R2. A brief summary of those eight studies is provided below Table 8. More detailed textual summaries for each of the eight studies are detailed in Appendix C.8.
Wildfires cause damage to ecosystems and human health, in addition to releasing greenhouse gasses when they burn. Climate change increases the number of wildfires across the globe. The recent massive wildfires, which hit Australia during the 2019–2020 summer season, raised questions to what extent the risk of wildfires can be linked to various climate, environmental, topographical, and social factors and how to predict fire occurrences to take preventive measures. An automated and cloud-based workflow was developed in [193] for generating a training dataset of fire events at a continental level using freely available RS data on GEE. Landscape fires have been a major natural hazard affecting West-Central Spain, and therefore, it is critical to be able to map and characterize landscape fires. Using the LandTrendr (Landsat-based Detection of Trends in Disturbance and Recovery) and FormaTrend (Forest Monitoring for Action—Trend) algorithms on the GEE cloud-computing platform, a method was proposed in [190] for identifying fire-induced disturbances. Wildfires are a common occurrence in the Brazilian Cerrado, often determining and changing the natural plant species in burn cycles. However, the Cerrado has been undergoing increasing anthropogenic conversion into cropland and pastures, which has changed hydrological and biogeochemical cycles within this ecosystem. This in turn has led to changes in fire size, pattern, frequency, and severity, so it is more important than ever that methods to quickly and reproducibly monitor the fire landscape within savannah are created. A completely cloud-based DL workflow combining Google Cloud and GEE was designed in [196] to classify burn scar areas in Brazil.
Traditional wildfire mapping field surveys and digitization efforts are time-consuming and hard to reproduce over time. Burned area indices can be created to monitor post-fire landscapes and their subsequent recovery, but their thresholds are not dynamic and so perform differently in different locations. Sentinel-2 data was used in [195], along with two different burn areas and LULC maps to train different ML classifiers (k-nearest neighbor (KNN), RF, SVM) to map wildfire damage in Australia. As the planet warms, forest fires are increasing in occurrence and severity. This has negative consequences for ecosystems, biodiversity, and human health. To estimate the damage caused by forest fires and their subsequent recovery rates, RS imagery is needed to monitor forests and burn scars over large areas. However, to date, most fire products are created with coarse RS imagery, making regional and local fire monitoring difficult. To determine the impact of using higher-resolution RS data products, how Landsat and Sentinel optical imagery affected an ML model’s performance in burn area classification was compared in [192].
Burned area maps showing where wildfires have occurred are important in being able to analyze global wildfire trends. However, many burned area maps derived from RS imagery are from the MODIS platform. The 250 m spatial resolution of products like FireCCI51 leave out a lot of detail, so the authors in [191] used CBERS, Gaofen, and Landsat imagery to create a 30 m burned-area dataset for 2015. However, the authors noted that their method had difficulty recognizing burned areas from recently plowed fields in agricultural areas, so crop-type masks should be used to remove potential false positives. Additionally, Landsat data was used for both the data collection and validation stage. Thus, the authors were not able to assess the suitability of using Landsat imagery for data collection purposes despite their high accuracy rates. Later on, [194] adapted the exact same processing steps on GEE to produce a burned area map for the year 2005, illustrating how sharing and storing code on GEE makes it easy to re-run analyses or adapt them for new use cases.
Satellite-derived spectral indices such as the relativized burn ratio (RBR) allow fire severity maps to be produced across multiple fires and broad spatial extents. In order to better interpret the fire severity in terms of on-the-ground fire effects compared to non-standardized spectral indices, [189] produced a map of composite burn index (CBI), a frequently used field-based measure of fire severity.

3.2.9. Heavy Industry and Pollution Monitoring

There are seven studies about heavy industry and pollution monitoring using GEE and AI. Table 9 below summarizes those studies and a word cloud generated from the titles, keywords, and abstracts of the seven papers is provided in Figure 14. The most frequently used words form the phrase “Google Earth Engine”. Most applications in this area are focused on monitoring reclamation or pollution at active or previous mining sites, so “mine” and “mining” feature prominently in this word cloud. The algorithm LandTrendr was used by several papers after identifying mine sites to monitor pollution and water levels or vegetation changes through time. From our interactive web app (see Appendix A) and Table 9, the most-used RS datasets are Sentinel-2, Landsat 8 OLI, Landsat 5 TM, Landsat 5, and Google Earth. The most popular models are RF, CART, and LandTrendr, and the most-used evaluation metrics are Kappa, OA, PA, UA. A brief summary of those seven studies is provided below Table 9. More detailed textual summaries for each of the seven studies are detailed in Appendix C.9.
Mining can lead to lots of environmental degradation during the actual mining process itself, but often continues to do so if mines are not properly reclaimed after the mine is no longer active. Field techniques for monitoring environmental damage operate on a limited spatial and temporal scale, failing to fully capture what is happening. RS can help monitor ecological changes during mining and ensure that mining companies clean up after mining has stopped during the reclamation process. A mapping study was performed in [198] for mining areas in the Brazilian Amazon using Sentinel-2A images and the CART classifier in GEE. To monitor mining disturbances at a coalfield in Mongolia, the LandTrendr algorithm was used in [199] to analyze Landsat data. The authors designed a fast, efficient method on the GEE platform to monitor surface mining operations and show that only 26% of promised reclamation was undertaken at the Shengli Coalfield. Heavy industry projects like mining normally require reclamation after the fact to ensure that local ecosystems can heal and regenerate. Monitoring sites that have undergone mining is made much easier with RS imagery because they are often large, spatially distributed ecological disturbances. This is especially the case for underground mining projects where subsidence occurs but is difficult to track without an aerial view. Landsat imagery and the LandTrendr algorithm were utilized in [202] to monitor water accumulation in subsidence areas of past mining in China. Mining is economically important because of the many jobs and resultant materials it provides but is associated with various environmental and health risks. One such danger comes from the failure of tailings dams, which store water with toxic levels of waste solids. Even though these failures can cause significant damage to the environment, human health, and infrastructure, there is not a global database containing active tailings dams. This in turn can make it easier for illegal mines to operate as legal mining operations with tailings dams are not heavily monitored. In order to keep track of mines and dams in Brazil, two different CNNs were used in [200] to first classify potential mining sites and then to classify perceived/potential environmental risk.
As cities expand and develop, construction and demolition waste is often stored until it can be further processed, reused, or gotten rid of. Sometimes these waste piles are orderly and are trackable, but many are not, making it hard to manage them and their potential negative environmental or social effects. Current methods to take stock of waste piles and dump sites rely on field investigations, which take a lot of time, effort, and money to produce. More work needs to be done to identify them using RS imagery and ML methods, but tuning different ML methods and their respective parameters can lead to different results. To test the efficacy of different ML algorithms for identifying waste and dump sites in optical imagery, the parameters for the CART, RF, and SVM algorithms available on GEE were optimized in [203].
Oil and gas pads are developed for production and then capped, reclaimed, and left to recover when no longer productive. Understanding the rates, controls, and degree of recovery of these reclaimed well sites to a state similar to pre-development conditions is critical for energy development and land management decision processes. The authors in [197] used time series data of the Soil Adjusted Total Vegetation Index (SATVI), calculated from Landsat 5 imagery, to track changes and assess vegetation regrowth on 365 abandoned well pads located across the Colorado Plateau. Previous estimates of particulate matter for the Canadian Air Pollutant Emissions Inventory (APEI) were based on the exposed mine disturbance areas that had been calculated using outdated mine area extents. With GEE JavaScript API, RF classifiers were used in [201] to produce maps of mine waste extents with Landsat-8 and Sentinel-1 and Sentinel-2 archives.

3.2.10. Climate and Meteorology

There are seven climate and meteorology studies using GEE and AI. Table 10 below summarizes those studies and a word cloud generated from the titles, keywords, and abstracts of those papers is provided in Figure 15. The most frequently used words are “Google Earth Engine”, “changes” and “climate”. There are also specific keywords for ocean-related studies like “sea” and “salinity”, and studies focused more on land and atmosphere characteristics like “surface”, “land”, “temperature”, “LST” and “albedo”. From our interactive web app (see Appendix A) and Table 10, the most-used RS datasets are Landsat 8, Landsat 5, and Sentinel-2. The most popular AI models are RF, and the most frequently used evaluation metrics are mean absolute error (MAE), OA, root mean square error (RMSE), R2. A brief summary of those studies is provided below Table 10. More detailed textual summaries for each of the seven studies are detailed in Appendix C.10.
Forests store much of the world’s terrestrial carbon, but globally they are under threat due to the effects of global warming and human disturbance. While forests release carbon immediately when they are cut down or otherwise disturbed, they also release carbon through secondary effects. This type of climate “memory” or lag in carbon flux is much less studied and so not well-known. To study this mechanism further, the authors in [209] used an LSTM and compared the performance to an RF for carbon fluxes in global forests.
Accurate satellite-derived albedo estimations are needed to parameterize and in turn to validate climate simulation models. MODIS satellite observations from 2000 to 2015 were analyzed in [204] using GEE to derive global snow-free land surface albedo estimations and trends at a 500 m resolution. A method was presented in [208] to obtain high-resolution sea surface salinity (SSS) and temperature (SST) by using Sentinel-2 Level 1-C Top of Atmosphere reflectance data. The consistency between Tropical Rainfall Measuring Mission (TRMM) multi-satellite precipitation and monthly gauged precipitation has been confirmed worldwide. A downscaling framework (from 25 km to 1 km) was proposed in [210] for TRMM precipitation products by integrating GEE and Google Colaboratory (Colab).
Furthermore, 30-m Landsat imagery has a long history of coverage between the seven ETM+ and eight OLI sensors. Sentinel-2 Multispectral Instrument (MSI) imagery has a higher resolution of 10-m and faster revisit frequency (10 days instead of 16 days for Landsat). Being able to use all of these sensors together for a given EO analysis would greatly increase the available spatial and temporal resolution, but the sensors have differences that need to be calibrated before they can be integrated. Still, this is one of the most-requested datasets we found in our review. Major-axis regression was performed in [205] on these datasets in pairs (seven ETM+/8 OLI, 7 ETM+/2 MSI, and eight OLI/2 MSI) across the entire coterminous United States and they were able to determine cross-platform correction coefficients for the Blue, Green, Red, NIR, and SWIR bands present in all three satellites.
Urbanization has changed the urban landscape and resulted in increasing land surface temperature (LST). In [207], the authors investigated the impacts of landscape changes on LST intensity (LSTI) in a tropical mountain city in Sri Lanka. There are several ongoing attempts to classify cities around the world based on various characteristics like urban canopy cover, total built-up area, neighborhood sizes, and urban heat island effects (for example, see Urban Atlas, World Urban Database Access and Portal Tools (WUDAPT)). These datasets can help planners and policymakers make more informed decisions as they consider implementing sustainability measures in their respective cities. However, these types of spatial datasets often rely on surveying methods that need to be continually updated. A cloud-based workflow was implemented in [206] and compared to the traditional method of using SAGA GIS for producing local climate zone city maps based on data like WUDAPT.

3.2.11. Disaster Management

Disaster management has six studies using GEE and AI. Table 11 below summarizes those studies and a word cloud generated from the titles, keywords, and abstracts of the six papers is provided in Figure 16. The most frequently used words are “Google Earth Engine”, “recovery”, “landslide”, “post-disaster”, “hurricane”, and “damage”. Many studies were focused on mapping buildings after flood or landslide events. One of the main challenges to doing disaster management research on GEE is that there is a delay to uploading RS data from the time it is recorded and the time it is uploaded to the platform. This limits the utility of doing time-sensitive research or deploying time-sensitive applications on GEE (i.e., the keywords “rapid”, “assess”). From our interactive web app (see Appendix A) and Table 11, the most-used RS dataset is Landsat 8. The most popular ML models are RF and CART, while the most frequent evaluation metrics used are OA, PA, and UA. A brief summary of those studies is provided below Table 11. More detailed textual summaries for each of the six studies are detailed in Appendix C.11.
RS imagery has long been used to monitor community recovery after natural disasters. Decision makers can use RS imagery and analyses to redirect resources during the recovery process. Even so, many studies focused on disaster recovery use VHR imagery that increases data storage and compute needs. To explore the suitability of GEE for disaster recovery, the authors in [215] used an RF model trained on Landsat imagery to do change detection on pre- and post-disaster areas in the Philippines. Building detections in post-disaster scenes are a valuable resource for timely assessing damages in disaster management. Using RGB images as input, an automatic building detection method was proposed in [216] to find buildings and their irregularities in pre- and post-disaster (sub-) meter resolution images.
Landslides are a major natural hazard in mountainous regions. Traditionally, landslide mapping heavily relies on field surveys and visual interpretation of satellite imagery. A new method was proposed in [211] for mapping landslides in Nepal using RF on GEE. Many agricultural landscapes have incorporated surface drainage systems to stop fields from flooding during heavy precipitation and runoff. These underground drainage networks have caused flood forecasting to become harder to do since it is more difficult to track water in space and time, as drainage networks are not always well mapped. The authors in [212] created surface drainage maps through running an RF model on the GEE platform, by analyzing vegetation, thermal, moisture, and climate datasets, along with surface drainage records.
Producing flood maps is critical to giving advanced warning to those who may be in affected areas. However, producing these maps in real time is hindered by the fact that many mapping applications focus on too small an area due to lack of computational resources. The authors in [214] presented a case study for the 2018 Kerala flood in India. They demonstrated how GEE can be used to process large optical and SAR RS datasets, in conjunction with field and precipitation data, using image processing techniques to produce high-resolution flood maps over a large area. Flood forecasting in Bangladesh currently involves running hydrological inundation simulations based on DEMs in order to produce early warning notifications. However, these simulations are compute-intensive and require access to high-resolution, up-to-date DEMs. Cloud-based mapping using RS imagery has the potential to provide quicker inundation forecasts over a large spatial area. This is especially true for analyses that utilize SAR imagery, since floods are often caused by heavy rains leaving optical imagery obstructed by clouds. The authors in [213] produced flood maps in Bangladesh, by taking advantage of the easy-to-find data and freely available compute on the GEE platform.

3.2.12. Soil

There are six soil studies using GEE and AI. Table 12 below summarizes those papers and Figure 17 below is the word cloud generated from the title, keywords, and abstract of the six soil studies. The most frequently used are characteristics that soil researchers try to monitor: “SOM”, “organic”, and “stocks” for soil organic matter, as well as “litter”, “moisture”, “thermal”, and “salinity”. MODIS and Sentinel-2 data feature heavily in the word cloud because they are the most-used data products (alongside Landsat). From our interactive web app (see Appendix A) and Table 12, we found that the most-used RS datasets are Landsat 8 OLI and SRTM DEM. The most popular ML models used are RF and CART, while the top evaluation metrics used are R2 and RMSE. A brief summary of those studies is provided below Table 12. Much more detailed textual summaries for each of the six studies are detailed in Appendix C.12.
Many authors come to GEE curious to test out the new cloud computing platform for their domain-specific application. GEE provides freely available compute and data to interested researchers, which they then use to explore the strengths and limitations of GEE. An early soil mapping study was performed in [217] on GEE in 2015. Collecting field samples for soil mapping can be time- and labor-intensive and can be bound to small areas given their costs. These data collections also need to be repeated, representing a barrier to presenting up-to-date information that covers large spatial areas to decision-makers. To address these issues, the authors in [219] used field observations, DEM data, and Landsat imagery on GEE to map different soil types and soil attributes across a large region in Brazil.
Soil plays a critical role in the carbon and water cycles, along with providing areas for habitat or agricultural use. The spatial distribution of litter and soil carbon (C) stocks is important in greenhouse gas estimation and reporting and inform land management decisions, policy, and climate change mitigation strategies. The effects of spatial aggregation of climatic, biotic, topographic and soil variables on national estimates of litter and soil C stocks were explored in [220]. The authors also characterized the spatial distribution of litter and soil C stocks in the conterminous United States (CONUS). Litter and soil variables were measured on permanent sample plots from the National Forest Inventory (NFI) from 2000 to 2011. Beyond mapping litter and soil carbon (C) stocks, it is also important to map soil organic matter at a large scale, but traditional field collection techniques are cost- and effort-intensive. Many researchers have thus turned to RS imagery and/or ML to map soil organic matter, but there is still some difficulty in selecting the right input data or ML model for prediction. To determine how different datasets and ML models perform on GEE in predicting soil organic matter, an ANN, RF, and SVR model was compared in [222] with MODIS, Sentinel-2A, and DEM data as input.
Accurate soil moisture content information is crucial to being able to correctly model water, energy, and carbon cycles, as well as being key to understanding and predicting natural hazards like drought, floods, and landslides. However, most soil moisture datasets are created with medium or coarse spatial resolution. Using optical, thermal, and SAR imagery in addition to DEM data, a global, high-resolution soil moisture map was produced in [221]. The authors concluded that optical RS imagery and land-cover information play the most important roles in determining soil moisture content, but SAR imagery and soil data also contribute significantly to the model’s overall performance. This finding highlights other studies’ results ([95,161,182]) that the combination of optical and SAR data improves predictive outcomes. Soil salinity can impact agricultural yields and is a global issue, but current datasets like the Harmonized World Soil Database have low spatial resolution and need to be updated. As one of the main soil salinity datasets in use, this makes it difficult to estimate up-to-date soil salinity levels even as they change due to increasing drought severity from global warming. The authors in [218] explored GEE’s potential to make a global soil salinity map based on field data and Landsat thermal infrared imagery.

3.2.13. Cloud Detection and Masking

There are five studies (four are novel methods) that used GEE and AI for cloud detection and masking. Table 13 below summarizes those studies and a word cloud generated from the titles, keywords, and abstracts of those papers is provided in Figure 18. The most frequently used words are “Google Earth Engine”, “cloud(s)”, and “masking”. The main task in this literature domain is masking clouds in optical imagery (which is not a problem for SAR data), so the words “optical”, “Landsat-8”, and “Sentinel-2” are also prominent in this word cloud. From our interactive web app (see Appendix A) and Table 13, the most-used RS datasets are Landsat 8 OLI, SRTM DEM, and Google Earth. The most popular models are Fmask and the most-used evaluation metrics are CE, OA, OE, RMSE. A brief summary of those studies is provided below Table 13. More detailed textual summaries for each of the five studies are detailed in Appendix C.13.
Many mapping and identification tasks that use RS imagery and ML rely on optical cloud-free imagery. Detecting and removing clouds in optical RS imagery is a difficult but important task, as many other classification and detection methods rely on masking clouds and on obtaining cloud-free imagery. Many algorithms, including Fmask, which is a commonly used algorithm to create a cloud mask in RS imagery, rely on using thresholds for single RS images, which makes them prone to errors when applied to entire RS time series. The authors in [223] treated cloud detection as a change detection problem across time using a kernel ridge regression model.
Optical RS imagery has many applications across several environmental and earth science domains. However, Optical RS imagery is often occluded by clouds, limiting its utility. While processing techniques like taking monthly composite images to remove clouds works to some extent, it relies on having enough cloud-free imagery to make the composites, which is not always available. Recently, DL models have shown the ability to reconstruct scenes in optical RS imagery that is blocked by clouds. However, researchers looking to use DL models in cloud environments often have to coordinate across different storage, analysis, and ML platforms (e.g., Google Cloud Storage, Google Colab, Google AI), which can be cumbersome and expensive. The authors in [227] thus decided to implement their cloud-removal DL model directly in GEE. Their model, DeepGEE-S2CR, is a cloud-optimized version of the DSen2-CR model presented in [228] and fuses co-registered Sentinel-1 and Sentinel-1-2 images from the SEN12MS-CR dataset.
Cloud detection is a well-studied task and GEE has several cloud detection/masking algorithms available on its platform. However, some of them have shown to be unstable leading to considerable under- or overestimation. To explore how CV algorithms and ML models can be used together on GEE, [226] combined the existing Cloud-Score algorithm with an SVM to detect clouds in imagery ranging from Amazon tropical forests, Hainan Island, and Sri Lanka. Fmask is the most commonly used method but has limited use in mountainous regions where terrain and shadows can be confused for clouds or when sudden changes in the Earth’s surface occur in time-series imagery. A convolutional neural network (CNN) called DeepGEE-CD was built in [225] to detect clouds in RS imagery directly on the GEE platform. Cloud screening may be cast as an unsupervised change detection problem in the temporal domain. A cloud screening method based on detecting abrupt changes along the time dimension was introduced in [224], assuming that image time series follow smooth variations over land (background) and abrupt changes are mainly due to the presence of clouds.

3.2.14. Wildlife and Animal Studies

Wildlife and animal studies is one of the less developed application domains using GEE and AI (four studies total). Table 14 below summarizes those studies and a word cloud generated from the titles, keywords, and abstracts of the four studies is provided in Figure 19. The most frequently used words are “insect”, “bird”, “roadkill”, and “malaria” which show what scientists are trying to monitor in this domain. Meanwhile, specific place names like”, “forest”, “water”, “Peru”, and “Amazonian” reflect where the studies are being done. From our interactive web app (see Appendix A) and Table 14, the most popular AI model is an RF and the most-used evaluation metric is OA. A brief summary of those studies is provided below Table 14. More detailed textual summaries for each of the four studies are detailed in Appendix C.14.
UAS (i.e., drones) are able to collect high-quality data over large aggregations of wildlife, as they offer an attractive opportunity for improving methods and increasing cost effectiveness of monitoring wildlife populations. The authors in [229] explored the use of UAS for identifying Ny. darlingi breeding sites with high-resolution imagery (~0.02 m/pixel) and their multispectral profile in Amazonian Peru. Land use changes such as deforestation, irrigation, wetland modification and road construction, may drive infectious disease outbreaks and interfere with their transmission dynamics. Accurate classification of Ny. darlingi -positive and -negative water bodies would increase the impact of targeted mosquito control on aquatic life stages. Researchers in [231] developed a semi-automated framework for monitoring large complex wildlife aggregations using drone-acquired imagery over four large and complex waterbird colonies.
The success of conservation and mitigation management strategies may greatly depend on the knowledge of the temporal and spatial patterns of roadkill risk, and its relationship with key environmental drivers. The authors in [230] used a set of freely available environmental variables, namely habitat information from RS observations and climatic information from weather stations, to assess and predict the roadkill risk.
Pest outbreaks are causing more damage to forests around the world as winters get warmer and summers are drier and start earlier. These conditions allow pests to proliferate, though pests do not always kill trees outright. They often defoliate trees, which weakens them before future pest outbreaks or drought conditions. However, forest defoliation is understudied and much of the research done in this area relies on coarse resolution data. Using Landsat RS imagery, climate variables, and government environmental data, Ref.[232] analyzed Pine Processionary Moth outbreaks in pine forests in southern Spain.

3.2.15. Archaeology

Archaeology is also one of the less researched applications using GEE and AI (three studies total). Table 15 below summarizes those studies and a word cloud generated from the titles, keywords, and abstracts of the three papers is provided in Figure 20. The most frequently used words are “Google Earth Engine”, “detection”, “satellite”, “drone”, and “survey” while terms like “automated” and “mounds” are also common. This reflects the papers we reviewed and their focus on using the GEE platform to scale up and automate exploratory surveys using RS data, both from satellite platforms and self-collected drone imagery. From our interactive web app (see Appendix A) and Table 15, the most frequently used RS dataset is WorldView 2. The most popular ML model is an RF and the most-used evaluation metric is visual analysis.
Utilizing RS imagery for anthropological studies can be difficult because of a lack of financial resources, technical training, or compute needed to analyze large RS datasets. More specific to searching for mounded sites and scattered materials that would indicate past human habitation in RS imagery, it is difficult to pair legacy field data with RS imagery. When archaeologists look for potsherds, either in the field or at development sites, the standard practice is to form walking surveys to detect evidence of prior human settlement. This usually involves a large group of people walking in parallel lines over a given area, documenting what they find along the way. This process involves a lot of upfront personnel costs. The authors in [233] demonstrated the potential role of GEE in the future of archaeological research through two case studies. The authors in [234] used drone imagery and GEE to detect potsherds in the field in the hopes of speeding up this process. In [235], the authors utilized optical and SAR data on GEE to create a classifier capable of outputting a likelihood that there is a mounded site in a given region of the Cholistan Desert in Pakistan. More detailed textual summaries for each of those three studies are provided in Appendix C.15, as they are all proposed some novel methods.

3.2.16. Coastline Monitoring

Coastline monitoring is one of the less researched applications using GEE and AI (three studies total). Table 16 below summarizes those studies and a word cloud generated from the titles, keywords, and abstracts of the three papers is provided in Figure 21. The word clouds provide an informative (general and specific) focus of each set of the papers. For example, we can see that the most frequently used general words are “shoreline”, “coastline”, and “tidal”, and “beach”. This type of research is interested in first detecting coastlines, but also monitoring geospatial changes over time (i.e., keywords “detection”, “position”, “changes”, “temporal”, “time”, and “multi-annual”). From our interactive web app (see Appendix A) and Table 16, the most-used RS datasets are Landsat 5 TM, Landsat 7 ETM+, and Landsat 8 OLI.
Observing and quantifying the changing position of the shorelines is critical to present-day coastal management and future coastal planning. The authors in [236] presented an automated method to extract shorelines from Landsat and Sentinel satellite imagery. The authors in [237] evaluated the capability of satellite RS to resolve at differing temporal scales the variability and trends in sandy shoreline positions. In [238], the authors proposed a method to map continuous changes in coastlines and tidal flats in the Zhoushan Archipelago during 1985–2017, using Landsat images on the GEE platform. More detailed textual summaries for each of those three studies are provided in Appendix C.16.

3.2.17. Bathymetric Mapping

There are only two bathymetric mapping studies leveraging GEE and AI. Table 17 below summarizes those studies and a word cloud generated from the titles, keywords, and abstracts of the two papers is provided in Figure 22. The most frequently used words are “bathymetry”, “satellite” and “satellite-derived”, as well as “validation”. Currently, bathymetric mapping applications are derived from radar, sonar, and light detection and ranging (LiDAR) measurements from boats and small aircraft in conjunction with model simulations. The authors using GEE for bathymetric mapping research are trying to use satellite imagery and ML on the cloud platform to generate bathymetric maps over much larger scales than would be possible otherwise.
Mapping bathymetry across large areas is a difficult problem. This is in part because high-resolution aerial radar data, which produces some of the best bathymetry maps, are expensive to collect and only cover small areas. Researchers in [239] paired field observations of coastal depths with RS imagery to train multiple linear regression models that can then predict in areas where no depth information is available. Without accurate bathymetry information, ships risk getting stranded in shallow water areas around the globe. Typically, ships equipped with sonar and planes that have airborne LiDAR are used to get water depth measurements. However, sonar is not suitable for shallow water measurements and airborne LiDAR is expensive to get. Moreover, there are very few bathymetry datasets that have a global reach. The authors in [240] used airborne LiDAR, sonar, and Landsat data to estimate bathymetry in Japan, Puerto Rico, the USA, and Vanuatu using an RF model. More detailed textual summaries for each of those two studies are provided in Appendix C.17.

3.2.18. Ice and Snow

There are only two studies in ice and snow that have leveraged GEE and AI. Table 18 below summarizes the two studies and Figure 23 below is the word cloud generated from the title, keywords, and abstract of the two studies. The authors are interested in measuring “changes” and “trends” in “ablation”, “break-up”, “freeze/-up”, “freezing”, “phenology”, “subsistence”, and “reflectance” levels in “ice”, and “snowfields”. From our interactive web app (see Appendix A) and Table 18, we found that the most-used RS datasets are Landsat 5 TM, Landsat 7 ETM+, and Landsat 8 OLI.
Global warming is putting pressure on Arctic ice and snow cover as the Arctic is heating up much more rapidly than the rest of the planet. In Alaska, changes in perennial snow cover have wide-ranging implications from changing hydrology and vegetation patterns, altering the local topology through more frequent freeze-thaw cycles, and by disrupting the ability of subsistence hunters in the region to find food. The authors in [241] used a CART model to track the changes in the cryosphere in Alaska. The duration and seasonality of lake ice is sensitive to local environmental changes such as wind, air temperature, and snow accumulation. Lake ice phenology (LIP, ice breakup and freeze-up dates and ice duration) is a particularly robust proxy for climate variability. The authors in [242] studied LIP in Qinghai Lake, China. A more detailed textual summary of those two studies is provided in Appendix C.18.

3.3. Advances in Methods

In this section, we provide a summary of all 21 novel method papers (i.e., those marked as * in the tables in Section 3.2 above). Specifically, see Table 19 below for novel methods for classification tasks, and Table 20 for segmentation tasks, and Table 21 for regression tasks. A word cloud for all 21 novel methods papers (i.e., those papers in Table 19, Table 20 and Table 21) is provided in Figure 24. The most frequently used words are “Google Earth Engine”, “classification”, and “machine learning”. For being smaller research domains (in terms of total paper count for this GEE + AI review paper), archaeology and cloud detection and mapping research presented many novel ways to use CV, ML, and DL methods on the GEE platform. In the word cloud above, “cloud”, “masking”, “archaeology”, “archaeological”, and “survey” reflect this influence. “Urban”, “water”, and “fire” are also included in this word cloud as they included papers with novel methods in them. The “cover” and “surface” keywords could be referring to the LULC, vegetation, water, or infrastructure domains. Those novel methods are detailed in subsections in Section 3.2. Inspired recommendations from those novel methods are provided in Section 4.2 and Section 4.3. It is interesting to see that there are only three studies about archaeology (Section 3.2.15), but all three papers have proposed novel methods.

4. Challenges and Research Opportunities

This section provides a summary of the patterns observed (Section 4.1) from reviewing the research discussed above. Section 4.2 and Section 4.3 describe the challenges and research opportunities from application (Section 4.2) and technical (Section 4.3) perspectives.

4.1. Summary and Discussion

4.1.1. Brief Summary of Reviewed Studies

Our comprehensive and interactive review indicates that the integration of GEE and ML (such as RF) is relatively straightforward and can directly run model training on GEE, and that the integration of GEE and DL is not as intuitive and convenient (e.g., DL is not supported directly in GEE, detailed in Section 4.1.2 below; researchers need to train DL models outside GEE, either offline on their local computers or in Google Cloud AI). However, the literature confirms that the integration of GEE and AI is becoming more widespread for geospatial analysis across a range of domains (Section 3.2). The expanding range of applications and increasing integration of AI methods into GEE observed in the reviewed literature affirm their potential to enable effective and accurate RS systems at a variety of scales.
Among the 200 reviewed studies, the most frequently used RS data are (see Figure 4c for details): Landsat 8 OLI (74 studies), Landsat 5 TM (49 studies), and Landsat 7 ETM+ (48 studies). RF (125), SVM (40), and CART (38) are the most popular ML models (see Figure 5a for details). It is not surprising that RF is the dominantly used model, as RF is a widely accepted and efficient ensemble learning model that has demonstrated the ability to cope well with a number of common ML problems (e.g., imbalanced data, missing values, the presence of outliers, and overfitting) [243]. Among the reviewed studies, the majority used ML (181), only a very small portion used DL (22) and CV (16); this is not surprising, due to GEE’s limitations (Section 4.1.2). Note that the number does not add up to 200, because some studies used combinations of ML, DL, and CV, so they were counted multiple times. Among the 22 DL studies, most of them had to run the DL models either offline on their local computers or on the Google Cloud AI platform. Only a very small portion of studies (Section 4.3.1) actually integrated GEE with DL in an indirect way—DL models were trained offline or on Google Cloud AI and then weights were uploaded to GEE and performed online prediction there. The most-employed evaluation metrics are OA (137 studies), PA (101), UA (98), and Kappa (76) (see Figure 5b for details). Of the 200 papers that we reviewed, all utilized GEE for both data processing, where 104 papers also ran computation offline.
While the research investigated in Section 3 has demonstrated the power of using GEE and AI for many different problem domains, most of the studies use GEE’s built-in ML methods (e.g., RF, SVM, and CART). There is still a long way to go before researchers can more easily develop, implement, test, and use novel AI methods (especially DL) on the platform (see Section 4.1.2) due to bottlenecks in integrating GEE with Google AI cloud. Some thematic areas are saturated with application-oriented papers, as is evident by the list and number of citations in each subsection in Section 3.2. Our recommendation is that for these areas (e.g., crop mapping and LULC), journals take less application-based papers unless they are contributing new datasets or processing pipelines for working with multiple datasets and start calling for novel method-based papers. However, other areas (e.g., archaeology and bathymetry) could benefit from more use-cases or proof-of-concept papers that open-source their code and data, speeding up the pace of research in those respective fields.
From our interactive web app tool (see Figure 20 below), we noticed that most work does not include hardware and software specifications (e.g., what CPU/GPU the authors used to run their models, what Python libraries they used to implement the DL models, etc.) and/or processing times [244]. Of the 200 total papers we reviewed, 101 ran strictly in cloud computing environments (i.e., they had no offline component). Of these 101 papers, only 10 papers provided their offline computation specifications (see Figure 25b for details). From Figure 25a, most GEE integrated with AI work ran on the GEE cloud platform. Of these papers, 98 (i.e., those marked as NA, which refers to “not applicable”) ran solely on cloud platform(s) and 92 (those marked as NS, which means “not specified”) ran locally without giving the hardware specification of the machines or runtimes for their analyses. Of those studies that used cloud computation, the majority of them are on GEE while a few combined GEE and the Google AI platform. A visual summary of software used in the reported literate is provided in Figure 26. If a publication only used GEE or its APIs, this is given a value of “NA” for “not applicable” since no additional software was used. We can see from Figure 26 that 96 papers fall into this category. Of the remaining papers that specified software that was used to complete part of an analysis outside of GEE, 27 studies used R, 23 used Python, 19 used ArcGIS, and 10 used the scikit-learn Python package. To make models comparable, reproducible, and to inform the design of RS systems, it is important to report this type of information [245]. This is even true for index-based methods and more traditional ML models so that researchers can fully evaluate the trade-offs between runtime, accuracy, and ease of implementation. The interactive web app tool that accompanies this review is intended, in part, to make future research more reproducible. Most papers have an open-access PDF/HTML version of their manuscripts, though a sizable portion of manuscripts (42 out of 200 of reviewed articles) do not. To increase the rate of progress integrating GEE and AI, we suggest authors seek to provide an open-access version of manuscripts whenever possible.

4.1.2. GEE Limitations

GEE serves as a great free-of-charge cloud platform for EO big data processing and analysis. With the very large amounts of data and combinations of temporal domains utilized in [21], GEE was critical to enabling these investigations. The use of GEE also facilitated the testing of several ML algorithms in a much faster way than would have been possible without it. The oil palm classification demonstrated [107] in GEE is useful to provide a quick understanding of oil palm plantations present in the landscape. This in itself is advantageous for independent monitoring bodies to conduct a survey of the landscape in question and conduct more detailed assessments if necessary. In the near future, it is foreseeable that a growing number of large-scale mapping and monitoring programs, enabled through the integration of AI with GEE, will emerge as critical to tools to help scientists, managers, and policymakers understand and respond to our environment [136].
However, GEE also has multiple noticeable limitations. Many authors reported compute limits, lack of processing methods, the inflexibility of different models, and a lack of data as their main limitations (each detailed below). Some recommendations for future research derived from these limitations are provided in Section 4.2, Section 4.3 and Section 4.4.
Compute limits [17,55,78,83,85,87,93,141,155,171,200,217,234,239,240]: Authors often ran into memory errors when analyzing too many field samples/observations. This also happened when the size of authors’ input data was too large more generally and it was difficult to know beforehand if intermediate processing steps would trigger this error. Thus, many authors had to export data as part of their analysis to access functionality not on GEE or because using GEE would make them run out of the amount of free compute provided. For example, every image uploaded to GEE (at the time of this paper’s release) is limited to 10 GB [234]. As the authors used sub-centimeter drone imagery, they had to downsize each image before uploading it, resulting in a loss of resolution. See below for a few quoted limitations:
“…The users are limited to approximately 1 million training points…, a limitation in using a high number of trees within GEE when the amount of field samples is high” [17].
“One of the disadvantages of using the GEE cloud computing platform is that it limits the number of field samples and input features. This is especially challenging when the analysis is applied to a large domain, which may reduce the efficiency of the implemented method” [171].
“…The current GEE pipeline for processing the available data on GEE through the Python or JavaScript APIs requires exporting large volumes of data to cloud or local storage … These processes are time consuming and require extra funds for cloud processing and cloud storage” [87].
A lack of processing methods/models/algorithms [17,21,35,46,81,85,93,101,107,110,120,141,152,158,159,160,201,215,217,221,225], reasons listed were:
There is a lack of domain-specific models and methods (GEE algorithms are general) because GEE is more developed in some areas (LULC, forest, vegetation, crop) than others [17,46,107,152,158,225,239];
No neural networks (NNs) are currently supported on GEE directly, but many authors use DL models for their research [46,55,71,107,136,161,166], and they either have to train their DL models offline or on Google Cloud AI, which is not free of charge. Authors can also use TensorFlow on Google Colab and Google Cloud AI but not directly on GEE. For example, in [225], “… limited by the computation resource of GEE, some specific convolution layers of DNN cannot be implemented in GEE. For example, a dilated convolution layer could not be achieved due to the fact that dilation is not supported in the convolution API provided by GEE. Conversion of other types of convolutions to the convolution used in this study may help to solve this problem and it needs further investigation…”. The authors in [156] mention, “… integration of the Google AI platform with GEE creates a versatile technology to deploy deep learning technologies at scale. Data migration and computational demands are among the main present constraints in deploying these technologies in an operational setting;”
SNIC is the only object-based classifier on GEE; authors also want more “advanced methods” or just more options;
Hyperparameter tuning is not possible on the platform [21], so many authors use local software (e.g., scikit-learn) for this purpose and then upload the models to GEE afterwards;
One of the benefits of using an RF model is that you can run a feature importance analysis afterwards to determine which set of input features contributed most to the model’s learning. However, this extremely common and important operation is not possible on GEE.
Inflexibility of models [19,35,46,152,159]: This limitation is similar to lack of models but is different in that it describes issues using models already on GEE. For example, authors in [35] emphasized, “A third limitation to the modeling approach described here is its current incomplete use of cloud-computing services, and reliance on desktop computer power to run the BRT models. Ideally, the modeling would be run within the same environment where the satellite data are preprocessed—Google Earth Engine—or a similar cloud-computing service offering similar levels of access to Sentinel datasets. GEE does currently provide machine-learning algorithms such as random forests, but these do not provide the flexibility that is currently offered within the BRT R functions”. This is both lack of methods and model inflexibility. The authors in [46] found that in general the algorithms on GEE were not very flexible and some preprocessing steps such as dealing with missing data were difficult to implement. Thus, the authors performed all preprocessing steps outside of the GEE platform.
Lack of data [32,46,54,67,75,94,120,126,127,160,161,162,183,184,193,215,221]: This related to both a lack of field observations and curated RS datasets.
Not every data product is on GEE;
Authors specifically called for a Landsat-Sentinel combined dataset. This dataset could serve as the foundation for research in many different application areas by expanding both the spatial and temporal resolution available to researchers;
Very-high-resolution imagery is not on GEE, meaning that to validate GEE prediction results authors often need to download this data locally.
Importing and exporting data from GEE [83,126,193,198,234]: This process is time-consuming and results in lower resolution classification maps. However, many authors need to import or export data based on storage constraints on GEE.
Other limitations:
There is a delay from the time RS data are available and the time that they are uploaded to the platform, limiting their utility for time-sensitive applications [213,214];
Authors might have a hard time converting programs to GEE from their own environment [81,136,217]. Cited issues were that authors were not familiar with JavaScript, Python, or the GEE programming interface. Authors were concerned that not everyone would have the skillset to implement models in GEE;
A concern that data and code will not be kept private for sensitive use-cases [217].

4.2. Challenges and Opportunities from an Application Perspective

Most of the current integrations of GEE and AI utilize data and models already available on GEE (detailed in Section 3.2). Only a few papers proposed novel methods (summarized in Section 3.3). Below, we provide some challenges and opportunities related to application-oriented research.

4.2.1. Proof-of-Concept for Less Researched Applications and Novel Methods for Saturated Application Domains

The authors in [107] point out, “… classification method demonstrated in GEE is useful to provide a quick understanding of oil palm plantations… This in itself is advantageous for independent monitoring bodies to conduct a survey of the landscape in question and conduct more detailed assessments if necessary.” For applications that are not yet well-studied using GEE, it will be useful to run some proof-of-concept experiments on GEE. These types of analyses will shed light on what limitations exist for doing domain-specific research on the platform (e.g., are the main barriers a lack of data, lack of preprocessing models or AI methods, etc.).
Even for very saturated application domains (e.g., wetland mapping, see Section 3.2.6), there are few novel methods. We would like to clarify that it is not that there are no novel methods. Researchers still use interesting preprocessing pipelines, creating new datasets, and often use DL. Again, we take a very narrow view of “novel” in this paper and this definition is confined to how researchers are using AI methods on the GEE platform. Researchers focused on wetland studies seem to be much more focused on using free compute, compiling and scaling up datasets over larger areas than would be possible on local machines, and creating open-source processing and visualization pipelines. However, there is still a lot of room for novel methods for those saturated application domains. For example, it would be useful for a saturated application domain to experiment with novel methods developed for other domains. The web app we developed for this review paper will serve as an important tool to easily find novel methods (check the demo video of the web app for how to find a novel method paper; the link to the video is provided in the Appendix A).

4.2.2. Using ML for Exploration/as an Aid to Human Expertise

At a certain point, it is difficult or impossible for humans to determine meaningful relationships in complex, highly dimensional data. One of the ways AI is most helpful is in data exploration. Still, the goal of EO-AI research should not be to automate away human expertise since AI models cannot understand human values and are often heavily biased. In [233], the authors use an RF set to output probabilities instead of class predictions to identify possible archaeological features in Jordan. The authors in [235] do the same thing in Pakistan over a large desert, saving time and effort that otherwise would have required surveyors to spend time in potentially unsafe conditions. While [233,235] use ML models to prepare for fieldwork identifying archaeological mounds over large areas, [234] used it to identify potsherds in the field. They use drone imagery and GEE to identify potsherds, thus allowing surveyors to focus their attention on finding and cataloging them even over large areas. This exploration method has also been successfully demonstrated for burned area mapping in [191]. The authors use a similar process of first using an RF model in probability mode to find a good “starting point” for classification, and then they tweak the probability to remove false positives before using a pixel-aggregation algorithm to determine the final classification output. The authors are able to show that their method shows good agreement with other commonly used burned area products, but with finer classification boundaries. In order to explore the potential to distinguish between subtypes of surface water body, the authors in [158] use slope, shape, and phenology, and flooding information as input to an RF model to predict for lakes, reservoirs, rivers, wetlands, rice fields, and agricultural ponds. They found that their method does not work very well for wetlands and the OA is not very high (85%) across classes. However, the RF model they use is interpretable and they show which other subclasses are easy or more difficult to predict for. Unfortunately, the entire preprocessing (method) cannot be run directly on GEE, because the shape features cannot be calculated on the GEE platform and are crucial to the overall analysis; the authors first have to do this in a local environment and then upload them.
It is important to note that the authors in these papers are actively changing the results of classification. In some cases, they are doing so many times (over several iterations). Thus, they are introducing bias into their models, but the trade-off is acceptable if the emphasis is on exploration rather than on statistical validity. This methodology is similar to using an expert system where domain experts use ML systems in a “collaborative” way, blending human expertise with the automation capabilities of AI. Still, these models would need to be continuously tested on new data to make sure that their probability threshold values are accurate, and their predictions should not be taken at face value.

4.2.3. More (High Quality) Data

Many authors suggested and called for more data for performance improvement (e.g., [16,127,162]). However, as the studies in [105] showed, more data are not always better. The authors in [105] investigated the difference in ML model performance when using single image mosaics, time series RS imagery, statistical features (median, standard deviation), band ratios, or all of the features listed. They test this by training an RF model on each subset of data to create LULC maps in Brazil. The authors find that inputting a time series of the data is the most accurate, more accurate even than when using all of the computed indices and statistical features. The authors in [239] train four different multiple linear regression models on sonar from field data collection and optical RS imagery to map bathymetric depths in three different locations near Greece. They acquired good results with a very simple, intuitive model. While current trends point to the use of models with increasing complexity, it is important to note that many times a simpler model will perform well given high-quality input data.
The analysis in [246] suggests an increased focus on dataset scaling is needed; the authors further emphasized that scaling to larger and larger datasets is only beneficial when the data are high quality. Meanwhile, as [247] emphasizes, more data and a simpler NN is better than a bigger NN with more data. This echoes the data-centric views of AI [248] proposed by one of the AI pioneers Dr. Andrew Ng. Ng says it is time for “data-centric” solutions to big issues. Ng observes that 80% of the AI developer’s time is spent on data preparation [248]. Thus, domain experts should be involved in creating high-quality datasets, since they know the data sources and relevant input variables much better than AI engineers. Thus, together with the authors in [246,247], we call for responsibly collecting larger datasets with a high focus on dataset quality. We also call for researchers to share their datasets on GEE, which would be useful to a wide variety of domains and researchers. Potential recommendations in this direction: (1) improving the quality of existing datasets, and (2) working on generating more data but with a focus on good quality.

4.2.4. Feature Engineering and Feature Importance

Feature engineering (see Appendix A.1 in [1]) using RS data is difficult because it is time-consuming to do and often relies heavily on human experience, domain knowledge, and technical expertise (in terms of location, what kind of data are being processed, what variable to look for, etc.). Meanwhile, feature engineering is often necessary for a given ML analysis because of the large amount of RS imagery coming in every day. In this way, DL methods can help because they are able to recognize complex patterns in data without the necessity for feature engineering (feature engineering can still help in DL analysis, but is not necessary, i.e., NNs can learn complex patterns from raw data). Still, tradeoffs between traditional ML models and DL methods have not been properly mapped out for the RS space. The authors [43] set a good example by addressing this issue through comparing the performance of an RF model (with feature engineering) to a long short-term memory neural network (LSTM) and U-Net NN models (without feature engineering) to identify pasturelands. The RF model was trained on GEE, while the DL models had to be trained offline as GEE does not currently support DL models (Section 4.1.2). U-Net had the highest generalization across both the validation and testing sets, maintaining high accuracy rates, while the LSTM and RF model underfit the test set. To illustrate the tradeoffs between ML and DL models, the authors include run and inference times. The RF model was able to complete training and prediction in 3 h. As reported in [43], U-Net takes a long time to train, while the LSTM takes a long time at inference time. Specifically, the LSTM took 30 min to train but 23 h to predict on the test set, while the U-Net took 24 h to train but 1.2 h at inference time. Much more work like this should be done to explore the strengths and weaknesses for ML and DL models, as this will be helpful for many research areas that would like to take advantage of GEE and AI.
With proper features from feature engineering, ML algorithms, which require less (good quality) trained data than DL, often perform better than DL. For example, the authors in [136] reported that their results indicated that the classification accuracy of DL was not as good as traditional ML methods (e.g., SVM). We recommend the following three directions for future studies in terms of feature engineering.
(1) Compare multiple ML algorithms or ML vs. DL algorithms: As pointed out in [136], it is worth investigating which methods (ML vs. DL) are better for a specific domain application. Their results indicated that the classification accuracy of DL was not as good as traditional ML methods (e.g., SVM). Several ML models are compared in [115] to map oil palm using Landsat 8 imagery in Malaysia. The authors find that tree-based ML models (e.g., RF, CART) work better than an SVM for the task and are able to classify large areas with high accuracy. Even so, classification errors are traced to the relatively coarse resolution of Landsat data. The authors suggested that higher resolution imagery (e.g., Sentinel) and the ability in the future to use DL methods on GEE will most probably improve higher performance. The authors in [136] developed and implemented a new pixel-based method (Ppf-CM) in GEE using 525 full Landsat scenes (19.96 billion pixels) to monitor S. alterniflora dynamics. They found that Ppf-CM not only enhances the spectral separability between S. alterniflora and others, but also improves the problems caused by the scarcity of entire cloud-free Landsat scenes. These findings echo well with prior GEE-supported pixel-based studies (e.g., [80]) and further confirm that pixel-based methods outperform scene-based methods to monitor S. alterniflora. The classification results in [161] were evaluated using both pixel-based and object-based RF classifications available on the GEE platform. The results revealed the superiority of the object-based approach relative to the pixel-based classification for wetland mapping.
The authors in [46] compare several algorithms on the GEE platform, including CART, IKPamir, LR, a multi-layer perceptron (MLP), NB, RF, and an SVM, for crop-type classification. The authors also use an ensemble NN but have to move off the GEE platform since NNs are not currently supported. The ensemble NN performed the best out of all the models. The authors found that atmospherically corrected Landsat data boosted model performance more than when models were fed Landsat composites data. The authors in [56] compare the performance of an artificial neural network (ANN) to CART, RF, and SVM models on GEE for sugarcane mapping in China using Sentinel-2 imagery. The authors identify that the SVM performs the best, but then go on to show which type of errors each model makes. For example: the ANN tended to overfit the data and give too much preference to the sugarcane class, while tree-based models confuse the forest and water classes. The authors then incorporate Normalized Difference Vegetation Index (NDVI) information into the SVM to show how the model does with this extra information. It is not clear why the authors did not allow each model to see NDVI information, as this extra information may have helped various models learn better. If the authors wanted to show how models learned from phenology information versus phenology combined with NDVI information, they could have trained each model on separate subsets of the data. While GEE allowed [159] to train several ML models, some models failed to run due to computational constraints or inflexibility. The authors show that in all cases, ML models do much better at binary than multi-class classification. The authors in [66] utilize many ML algorithms available on GEE and compare specific time windows for phenological analysis and find that the closer the data comes to planting and harvesting time, the better the ML models performed.
(2) SAR + optical RS images for better model performance: In addition, many studies reported [17,57,68,72,74,165,166,182,212,214] or suggested in future work [46,56,107,215] that SAR combined with optical RS images would improve model performance. Three classification methods (SVM, RF, and decision fusion) were used in [52] for the pixel-wise classification for crop mapping. The SVM classifier resulted in the lowest accuracy. The integration of multispectral and SAR data improved the classification accuracy. To improve the results in this study, the authors in [56] identify that using SAR data would be helpful in removing the impact shadows have on classification errors for sugarcane mapping. The authors in [95] compare the contribution of SAR data and different indices (e.g., NDVI, EVI, Soil Adjusted Vegetation Index (SAVI), Normalized Difference Water Index (NDWI)) derived from optical data on overall classifier performance. They find that including SAR data moderately improves performance, while only NDWI gives the ML model a significant performance enhancement. Using optical, thermal, and SAR imagery in addition to DEM data, [221] produces a global, high-resolution soil moisture map. The authors use a gradient boosted regression tree (GBRT) model to train on in-situ observations paired with RS imagery to then predict soil moisture in other locations. After running a relative variable importance analysis, the authors can conclude that optical RS imagery and land-cover information play the most important roles in determining soil moisture content, but that SAR imagery and soil data also contribute significantly to the model’s overall performance. This finding highlights other studies’ results ([95,161,182]) that the combination of optical and SAR data improves predictive outcomes.
(3) What input for what algorithms (feature importance): This section is separate from feature engineering in that it is less concerned with computing new features from existing data than with determining which input variables contribute to model learning.
In [58], the random samples extracted from the training pool along with RS-derived features and climate variables were then used to train ecoregion-stratified RF classifiers for pixel-level classification. Evaluation of feature importance indicated that Landsat-derived features played the primary role in classification in relatively arid regions while climate variables were important in the more humid eastern states.
To investigate how best to identify impervious materials in RS imagery regardless of cloud cover, [182] combine nighttime light, DEM, and SAR data and an RF model on GEE. Their resulting maps are more accurate than commonly used maps like GlobeLand30. More importantly, though, the authors quantitatively show that using multiple sources of data are better than single sources for this task; optical data are the most important, but SAR data improve accuracy rates across all metrics. In future studies, more work like this needs to be done so that researchers can save time and effort by knowing which data will be useful for a task beforehand. The authors in [178] compare different combinations of input data and their impact on model performance. For their application, Landsat 8 data serve as better input than Landsat 7 alone or Landsat 7 data with computed indices like NDVI. Having access to datasets like the one produced by [178] will make it much easier for future researchers to create more accurate building detection models, either by allowing researchers to add to this dataset and training ML models or by using it as one of several other datasets incorporated into the same analysis.
It is important not only to be able to map the current state of wetlands vegetation, but how that vegetation is changing over time. However, different sets of input data and ML methods used for change detection of wetland vegetation need to be evaluated more fully as choices made during preprocessing and hyperparameter tuning can affect the end result of an analysis. The authors in [138] use an adaptive stacking algorithm to train an ML classifier on optical, SAR, and DEM data to identify wetland vegetation. Adaptive stacking is using one ML classifier to identify the optimal combination of ensemble classifiers and hyperparameters to be used for a given task. In this case, the authors use an RF model to determine the best combination of the CART, Minimum Distance (MD), NaiveBayes (NB), RF, and SVM classifiers on GEE. The authors find that the adaptive stacking method is much more accurate than the RF and SVM models alone. The resulting classification map is then combined with a trend analysis performed by the LandTrendr algorithm, which allows them to identify wetland vegetation distribution as it is now and also how it has changed over time. The authors in [138] also test their workflow on different subsets of input data and show that adding more data helped the adaptive stacking algorithm learn better (the best combination of input data was all of the data). The authors note that forest and reed classes were not identified well with their adaptive stacking algorithm, and that the LandTrendr algorithm will most likely need to be re-tuned in different environments.
The authors in [15] integrated single-date features with temporal characteristics from six time-series trajectories (i.e., two Landsat shortwave infrared bands and four vegetation indices), to produce an intact-disturbed forest map to track degraded forests. The whole processing pipeline is done on GEE using an RF. The authors also ran a relative variable importance analysis for each ecoregion. The authors are able to show that past maps are a bit outdated due to their inability to separate forest classes by intact and degraded, although their results vary from ecoregion to ecoregion. The purpose of the study in [21] was to determine how the inclusion or exclusion of data for training RF models with RS and temporally variable climate variables influences model outcomes. Cloud computing on GEE was utilized in [35] to create an open-source, reproducible map of wetland occurrence probability using LiDAR and RS data for the entire Alberta. Using a BRT, the authors are able to match a current governmental effort in Alberta while also producing a relative variable importance showing which RS variable might be the most useful for future wetland mapping efforts in the area.
The authors in [55] used a CNN–LSTM hybrid model to predict soybean yield in the contiguous United States using RS imagery alongside weather data and show that the hybrid approach works better than either CNN or LSTM alone, although the results were better in some states than others. Additionally, the authors create combinations of input data to determine which variables are most important in training their NN.
A low-cost method was demonstrated in [107] for monitoring industrial oil palm plantations in Indonesia using Landsat 8 imagery that allowed them to distinguish between oil palm, forest, clouds, and water classes using the CART, RF, and MD algorithms. Their results demonstrated that CART and RF had higher OA and Kappa coefficients than the MD algorithm. In addition, the authors [107] compared model accuracy based on different combinations of spectral bands (particularly red-green-blue (RGB) and infrared bands include shortwave infrared (SWIR), thermal infrared (TIR), and near infrared (NIR)), including all bands, to determine which would help specifically with oil palm plantation monitoring.
The authors in [136] used a specific invasive species in China as a case study for developing an ML pipeline that takes into account both cloud cover and phenological information. They compared the ability of a stacked autoencoder and an SVM to classify vegetation types. While the SVM was trained on GEE, the DL model had to be trained offline as the platform does not currently support DL models. The authors find that the DL model performs better than the SVM and that both models perform better with phenological information. The same species of plant can look different at different stages of its life while also being submerged under water in some RS scenes. The authors in [140] argue that phenology information in RS time series can better capture tidal flat wetland vegetation and so can compare phenology information to statistical (min, max, median) and temporal features (quartile ranges). They then feed this data into an RF while analyzing its effect on model performance during different periods of time (all data, green and senescence seasons) for wetland vegetation classification. The authors showed that the phenological information was the most important input feature to the RF, while combining all three sets of features led to the highest accuracy. In addition, the model performed best when predicting over both the green and senescence periods, most likely providing the model with a better estimate of the total variance needed to identify wetland vegetation. More research like this should be done to isolate the importance of individual input features and time periods in ML model performance. To explore the potential to distinguish between surface water body subtypes, [158] use slope, shape, and phenology, and flooding information as input to an RF model to predict for lakes, reservoirs, rivers, wetlands, rice fields, and agricultural ponds. Their method does not work very well for wetlands and the OA is not very high (85%) across classes. However, the RF model they use is interpretable and they show which other subclasses are easy or more difficult to predict for. The authors in [192] found that Landsat 8 data led to higher fire burn estimates while still improving fire burn detection accuracy, though both Landsat and Sentinel-2 catch more small fire patches than MODIS.
To determine the impact of using higher-resolution RS data products, the study in [192] compared how Landsat and Sentinel optical imagery affected an ML model’s performance in burn area classification. The authors used Weka clustering output and different spectral and index information as input into the CART, RF, and SVM models available on GEE. They find that both Landsat and Sentinel imagery produce much better maps that capture small burn areas that current maps and fire monitoring products like MODIS are not able to capture, though Sentinel imagery leads to an underestimation in burn area. The authors also find that the tree-based algorithms perform comparably to each other but much better than the SVM model. This study highlights the importance of analyzing different data sources and ML models to show their respective contribution to predictive performance.

4.2.5. Creative Integration of Existing Algorithms Available on GEE

Through in-depth exploration, it is promising for domain experts to propose creative integration of existing CV/ML algorithms available on GEE. The GEE has provided an extensive cloud platform to train and classify using ML algorithms. The authors in [192] have studied and evaluated the potential utilities of medium resolution satellite imageries of Landsat-8 OLI and Sentinel-2 to estimate precise forest burnt area over Uttarakhand, Himalaya. Specifically, they used the pre- and post-fire differential reflectance to capture fire patches using “differenced” burn sensitive spectral indices (dNBR, dNDVI, dNDWI and dSWIR, see the abbreviations list right before the References). The unsupervised Weka cluster layer used as input to ML algorithms along with differential indices, which played an important role to recognize the pattern and expansion of fire patches. Among three ML algorithms, CART and RF achieved better performance in terms of accuracy (Kappa) than SVM. To explore how CV algorithms and ML models can be used together on GEE, the authors in [226] combine the existing Cloud-Score algorithm with an SVM to detect clouds in imagery ranging from Amazon tropical forests, Hainan Island, and Sri Lanka. The Cloud-Score algorithm first masks input RS imagery, then is used for input to train the SVM. This process led to much higher accuracy rates than any of the other CV algorithms for cloud detection and does so with considerably lower error rates. The authors in [150] analyze to what degree different preprocessing steps affect the output water maps using both SAR and DEM data and two variations of Otsu’s thresholding algorithm. They showed that SAR data included radiometric terrain correction (RTC) as a preprocessing step yield more accurate results and that Bmax Otsu thresholding is more stable to different inputs than Edge Otsu. However, their analysis was limited in time and space, so more work needs to be done to test their results in different locations and varying terrain types at different times.
Model stacking, ensemble learning, and label estimation: Many authors use or test multiple CV, ML, and DL models in their research. Still, it is difficult to tune hyperparameters or choose threshold values that affect the end result of a given analysis. To alleviate these problems, several authors we identified in our review use different models to automate the hyperparameter tuning process. For example, in [135], the authors used two different RF models to produce maps of vegetation change, as detailed in Section 3.2.4. The authors in [138] used an RF model to train a separate ensemble classifier made up of CART, MD, NB, SVM, and another RF model. The first RF was able to choose the best combination of models and each model’s respective hyperparameters. This ensemble model performs better at wetland detection than any of the models individually. In [185], the authors use almost the exact same method (this time for building detection), though the final ensemble is chosen via a manual weighting process.

4.2.6. Beyond ML: Modeling in GEE

A majority of papers we reviewed used data and algorithms on GEE to complete their analyses. These “proof-of-concept” papers are often explorations by authors into how to use the platform or for showing that research typically done offline can be done in the cloud. However, many of the most straightforward classification and regression applications have now been sufficiently demonstrated. In the future, one area of research that should be given much more focus is on implementing more complex production applications built on top of GEE that make use of modeling. For example, after mapping out urban areas using ML classification techniques, the authors in [94] then went further and implemented an ecosystem service value model on the platform. This is a creative way to use the available parallel processing capabilities of GEE and is an under-researched application area.

4.3. Challenges and Opportunities from a Technical Perspective

Making the integration between GEE and AI more seamless would allow researchers and practitioners in various domains to better take advantage of GEE and AI. From our systematic review, we provide some identified future challenges, opportunities, and recommendations below for researchers, practitioners, and engineers (including GEE engineers at Google) to consider. Note that some of the recommendations are general directions (Section 4.3.1, Section 4.3.2, Section 4.3.3, Section 4.3.4 and Section 4.3.5) and others are more specific (Section 4.4).

4.3.1. Model Implementation and Online Learning in GEE

GEE has a large data catalog and houses many preprocessing methods, as well as various CV and ML algorithms. However, an often-cited limitation (Section 4.1.2) to research on GEE is that there are not enough methods or models implemented on the platform. Thus, a promising applied research direction for using GEE and AI is to implement and test CV or ML models that would be useful to other researchers. For instance, the authors in [141] developed a GPR model on GEE that works for both vector and tensor input. This model was also cloud-optimized specifically for the size limits of GEE, making it a lightweight but accurate option. Similarly, the authors in [84] implemented an unsupervised Bayesian model in GEE for LULC classifications.
Many GEE and AI analyses rely heavily on optical imagery. Obtaining enough cloud-free RS imagery can be difficult, though there are methods like Fmask that help remove clouds when they are present in RS scenes. Still, options for cloud-removal algorithms on GEE are limited. The authors in [223,224] both propose new methods for cloud removal over large areas that can be run directly on GEE, without the need to download data. In [224], the authors show that their proposed method outperforms popular algorithms like Fmask and Automated Cloud Cover Assessment (ACCA) by 4–5% accuracy. The authors in [223] tested their algorithm on both Landsat and Satellite pour l’ Observation de la Terre (SPOT) optical imagery. Along with the authors in [107,239], we call for more domain-specific novel AI methods to be implemented on the platform, which would be useful to a wide-variety of researchers. To move towards the seamless integration of GEE and AI, we suggest the following three directions.
(1) Simple but robust ML/CV methods: From current GEE limitations (see Section 4.1.2), one promising way to make integration of GEE and AI smoother, deeper, and more robust, is to develop simple, novel, and robust CV/ML methods, for example, Canny edge detector (developed by John F. Canny in 1986 but still being used in nowadays’ edge detection applications, cited a total of 39,549 times as of 17 April 2022 and by 1953 papers in 2021; his Google Scholar citations as of today total 66,796, though the author’s Canny edge detector paper’s citations are about 60% of the total). We explicitly list this example here to show that it is worth devoting the time to develop simple but robust CV/ML algorithms. The Canny edge detector is robust and is used in many image processing applications. However, this specific algorithm may not be appropriate for RS images or for RS images in a specific domain. We call for AI and RS researchers and engineers to develop robust CV/ML methods for novel and, ideally, computation-optimized, RS-image processing algorithms towards the smooth and robust integration of GEE and AI.
(2) Reimplementing and/or optimizing (both classic and state-of-the-art) CV/ML methods on GEE: The authors in [107] pointed out a need for more and better algorithms on the GEE platform. The authors in [141] implemented GPR, which is increasingly used because it is a transparent ML model that also outputs model uncertainties. The method in [141] has been optimized for green Leaf Area Index (LAI) in RS imagery, in a way that is optimized for GEE. First, they created the model so that it can run on vector or tensor time series imagery. Then, the authors used active learning (AL) for feature reduction so that the model only learns on important data while creating a model that can run within GEE’s memory confines. This GPR model is then used to gap-fill RS imagery focused on LAI, meaning the model is able to “see” through clouded optical imagery. More work like this should be done, either in creating new models to upload to the cloud that other researchers can use or optimizing these models so that they are memory efficient and thus can leverage GEE on the cloud, instead of needing to preprocess and model training on local computers or on Google Cloud AI. The authors mentioned that better GEE code documentation and error messages could help future researchers interested in developing custom ML models for the platform. (detailed in Section 3.2.4).
(3) DL with GEE: DL models are not currently available on the GEE platform (Section 4.1.2). However, some authors [69,151,225,227,228] have found an interesting workaround that allows them to use NN models directly in the cloud. All of these authors first train an NN model outside GEE, and then upload the weight matrices as data files that can be read by the JavaScript or Python development environments. Then, it is necessary to implement each layer in the network (convolutional layers, activation layers, etc.), so that imagery can be run through the NN at inference time to produce predictions. This method has worked across domains like water extraction, cloud detection, and crop mapping. Still, there are several caveats to this approach. First, researchers need to have access to the compute needed to train the NN model in the first place. Often researchers are drawn to GEE because of the freely available compute, so this method is mainly geared towards those looking specifically to use NNs. Researchers also need to know how to implement and test different layers in an NN, a task that many EO researchers may not have the experience for. Lastly, none of the authors listed above implemented the full training process on GEE (e.g., forward and backpropagation).
Novel model architectures: Both [72,147] used the GEE platform to download and process data with which they could then use to train novel NN models. The authors in [147] trained a CNN called DeepWaterMapv2 that can handle flexible input sizes of optical RS imagery and evaluate images with a constant runtime. Additionally, their CNN can filter out clouds to fill in obstructed scenes and predict where water is with high accuracy. The authors in [72], on the other hand, used both optical and SAR data from GEE to train a 3D U-Net model for crop-type classification. The 3D CNN architecture shows an improvement over the more traditionally used 2D convolution operations. Neither author used GEE itself for the DL part of their analysis, because NN models are not currently supported on GEE. However, their research shows that GEE makes it easy to locate data for a variety of applications.
Transfer learning (TL): TL is one powerful technique that makes models trained on large sets of data and compute available for applications without these resources. TL was initially proposed in [249] and recently received significant attention due to recent advances in DL [250,251,252,253,254,255]. Inspired by humans’ capabilities to transfer knowledge across domains (e.g., the knowledge gained while learning violin can be helpful to learn piano faster), the main idea behind TL is that it is more efficient to take a DL model trained on an (unrelated) massive image dataset (e.g., ImageNet [256]) in one domain, and transfer its knowledge to a smaller dataset in another domain instead of training a DL classifier from scratch [257]. A major assumption in many ML and DL algorithms is that the models will generalize to new, unseen data given that it is from the same feature space and distribution [258], and that there are universal, low-level features shared between datasets for different applications. However, this assumption does not hold for many real-world problems. For example, it is not uncommon that a classification task in one domain lacks sufficient data, but a very large set of training data is available in another domain, where the data may be in a different feature space or follow a different data distribution. In such situations, knowledge transfer, if done successfully, would greatly boost the learning performance by avoiding expensive and labor-intensive data-labeling efforts [250]. The authors in [71] showed that TL works the best when they use a U-Net to map sugarcane in Thailand, meaning that the pre-trained weights resulted in the highest accuracy, F1-score, precision, and recall. More work should be done towards evaluating the effectiveness of TL within the EO studies as it could potentially save large amounts of compute from not having to constantly train DL models from scratch. The authors note that their model does not take into account phenological information, which would have required changing the NN architecture, but that this is an area for future research using their method.

4.3.2. Web Interface Tools to Support ML Exploration

The authors in [107] noted that there is a need to make intuitive, easy-to-use tools for specific tasks that incorporate input from the public and other stakeholders like non-governmental organizations (NGOs) and government agencies. We recommend the following two directions for future research while researchers and practitioners develop such tools.
(1) Humans-in-the-loop: As authors in [1,116] emphasized, one big research direction we recommend is human-in-the-loop ML Human-in-the-loop computing aims to achieve what neither a human being nor a machine can achieve on their own. The authors in [259] emphasized that a human-centered understanding of ML can lead not only to more usable ML tools, but to new ways of learning computationally.
The authors in [234] used drone imagery and GEE to detect potsherds in the field in the hopes of speeding up this process. They train a CART, RF, and SVM on this drone imagery, but only the RF model produces adequate results. The authors test their workflow in two separate locations in Greece. This research is interesting because the overall goal of the paper is not to optimize accuracy per se, or to even replace human experts in the field. As the authors note, “It is important to note here that this method does not aim to substitute archaeological fieldwalking but complement much of the non-specialist work conducted by groups of people for long periods of time in conductive environments so there is more time and resources available to dedicate to specialized work”.
To explore how GEE could be used to create an open-source processing pipeline for deforestation mapping in Liberia and Gabon, [116] used two different RF models to create data masks and then predictions for various land types there. However, the output classification maps were then shown to local experts to correct, boosting the accuracy of the final accuracy rates. The authors showed that their method is more accurate than other efforts to classify deforestation rates in these two countries, though there were still some model misclassifications between classes due to not enough ground-truth data. This presents a future area of research, where ML/DL/CV models are used to generate first-order maps that are then verified by experts in that field (i.e., expert systems). Building land classification maps in this way saves experts’ time but also keeps humans in-the-loop where human values and knowledge can still be represented and included.
(2) Smart GEE + AI data annotator: ML, especially DL methods, are only as good as the amount of labeled training data that they have access to. To accelerate the integration of GEE with AI to generate informative insights, development of smart GEE data annotators for GEE and AI is one of the most important directions to go. Humans and machines they each have their own strengths, in the smart GEE-AI data annotator systems, the classifiers should be able to select what samples are most confused based on the current learning status and thus to ask the human annotator (e.g., domain expert) for annotation; this is what AL is good at. The main idea behind AL is to take advantage of a large set of not-annotated images by selecting which images would help to improve the performance of ML/DL and thus need annotation through an uncertainty selection strategy (see [1] for a detailed introduction about the selection strategy).
While DL receives lots of attention, these models still require a lot of input data and large amounts of compute to train them. However, as compute becomes publicly available in cloud-based platforms like GEE, obtaining large amounts of labeled training data remains a key bottleneck to using DL models. One novel way to make the data labeling process less time- and resource-intensive was illustrated in [156], where the authors used current water maps and a segmentation algorithm to automatically collect data labels from Sentinel-1 imagery. These data are then used to train variations of U-Net in an offline environment. Due to computational constraints, the authors were not able to compare their model to more traditional ML models like an RF. Even with their automated data labeling pipeline, the authors note that their study lacked sufficient data to adapt their method to more than one country and manual validation was still necessary to validate the model post-prediction.

4.3.3. Open-Source GEE-AI Library Development

One promising way to accelerate the integration of GEE and AI is the availability of open-source libraries in multiple languages (e.g., Python and R). All the studies we have investigated using GEE with DL have trained their DL models offline (detailed in Section 4.3.1), not directly on the GEE cloud computing environment.
A strong need of Python-based GEE application/package/framework: As [225] pointed out “some specific convolution layers of DNN cannot be implemented in GEE. For example, dilated convolution layer could not be achieved due to the fact that dilation is not supported in the convolution API provided by GEE. Conversion other types of convolutions to the convolution used in this study may help to solve this problem and it needs further investigation”, to make the integration of GEE and AI seamless, we need open-source GEE and AI libraries that help make sure existing AI (especially ML and DL) algorithms can be used in the GEE environment. Good examples of this are the Geemap [260] and Rgee [261] libraries, which make it easy to access GEE JavaScript functions for researchers who use Python and R. The authors in [35] noted that some uncertainties in their underlying training dataset, a lack of subsurface soil information, and having to move between GEE and offline analysis may have contributed to errors in this analysis. It will be easier to avoid errors like these if there are more open-source Python/R libraries that make it easier to connect GEE and local computers. GEE does provide JavaScript and Python APIs, but more work needs to be done to incorporate the wealth of well-tested AI algorithms already available in Python into these APIs [94].

4.3.4. Model Deployment Using GEE as Backend

Several publications have used GEE as a backend to their applications (e.g., see [262,263]), taking advantage of the parallel processing capabilities, freely available compute, and large number of datasets. However, one of the main benefits of using GEE is the wide variety of CV and ML algorithms available. In [32], the authors built a custom expert system to map global surface water changes using the platform. Using GEE as a backend allowed them to both run their analysis and then host their resulting maps on an interactive web browser. Another example is Remap [86], an application that allows users to crowd-source LULC observations while using GEE to browse data and make predictions using an RF model.

4.3.5. Vectorizing Data Boundaries

Both [72,147] implemented novel DL architectures using semantic segmentation. In the future, it would be very useful to instead design models capable of instance segmentation. That way, results can be vectorized and used to create global datasets for future mapping research. Having digitized boundaries for individual ecological features is the first step to monitoring them and measuring how they have changed over time. While the authors in [156] demonstrated a novel way of creating data labels via segmentation algorithms, this is still semantic segmentation and the data labels required additional verification. The authors in [233], however, showed this is possible. First, the authors used an RF model for detecting archaeological mounds. They then used an edge detection algorithm after the supervised classification to automatically digitize/vectorize boundary features. Obtaining an accuracy score before digitizing boundaries can give a higher level of confidence in using the resulting dataset in future studies.

4.4. Overarching Challenges and Opportunities

We have provided some separate recommendations for future research for both application-oriented (Section 4.2) and novel/technical (Section 4.3) research above. However, some higher-level combination of the recommendations for application-oriented and more technical perspectives will strengthen the integration of GEE and AI and thus further advance many domain areas of research. There are several opportunities for researchers and practitioners who have interdisciplinary backgrounds and expertise or for research groups with the required complementary expertise to team up and work on problems at the intersection of RS and AI. For example, domain experts working on using ML for exploration and as an aid to human expertise (detailed in Section 4.2.2) will be significantly more productive if there are intuitive, interactive, and visual open-source web app tools to support them in their work. Another example is deep and careful investigation of RS sensors, imagery, and AI towards novel and effective models and algorithms tailored for RS imagery. Particularly, simple but robust ML/CV methods (Section 4.3.1) will be more effective if the researchers and practitioners who have interdisciplinary background and expertise can work towards developing RS image-specific tools, since most CV algorithms are initially designed for camera images and videos, not for remotely sensed satellite images.
One more promising overarching direction is implementing open-source web tools (Section 4.3.2) so that users who do not have a programming background can explore and use existing CV/ML algorithms available on GEE. This could include reimplementing both classic and state-of-the-art CV and/or ML algorithms and deploying them on GEE. A lack of models and model flexibility are two of the most-cited limitations researchers give when using GEE (Section 4.1.2), so by building out the number and type of algorithms on GEE, scientists and practitioners will be better able to do their research in a more seamless way on the platform. Additionally, the GEE Python and JavaScript API documentation pages are not for domain expert users; they are made for web app developers, ML engineers, or for those researchers and practitioners who have an interdisciplinary background. We will stop here, as we do not want to confine research and practitioners’ imagination to propose and develop creative and effective overarching opportunities that will significantly advance various domains, which can leverage the power of GEE and AI.

5. Conclusions

To leverage RS big data for large-scale important challenges such as global climate change, intelligent methods and computation-intensive and -supportive cloud platforms (including cloud storage of huge RS datasets) are critical. GEE is a pilot platform that has great potential to support both challenges (i.e., AI methods and cloud computing platform). Yet to date, many application domains (Section 3) still remain at the proof-of-concept stage regarding leveraging GEE and AI. This trend may relate to a steep learning curve for researchers. Overall, based on our systematic and interactive (Appendix A) review, we contend that GEE integrated with AI has great potential to provide a collaborative and scalable platform for researchers, practitioners, and policymakers to solve critically important problems in various areas. However, many challenges, and thus opportunities, still remain for a deeper and more seamless integration of GEE and AI. This is especially true of the integration between DL and the GEE platform, which is detailed in Section 4.2 and Section 4.3. Up to now, to take advantage of DL with GEE, the time-consuming training process still has to take place outside GEE. Researchers and practitioners either have to train DL models offline on local computers or on a separate cloud computing platform (e.g., Google cloud AI), which is often not freely available to the public. In summary, the deeper and smoother integration of GEE and AI has considerable potential to address major scientific and societal challenges such as climate change and natural hazards risk management.

Author Contributions

All authors have contributed to this review paper. L.Y. initiated the review, contributed to writing and overall organization, identified selected research to include in the review, supervised the web app design and development, and coordinated input from other authors. J.D. took the lead on identifying relevant literature, contributed to writing and editing the text, and provided the data for the accompanying interactive web app. S.S. contributed to the web app design and development, word clouds visualization, and editing. Q.W. contributed to identifying selected research to include in the review and in writing part of Section 3. H.C. contributed to writing part of Section 3 and editing the whole manuscript. C.D.L. has contributed to editing. All authors have revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This material is partly based upon work supported by the US National Aeronautics and Space Administration under Grant number 80NSSC22K0384, and supported by the funding support from the College of Arts and Sciences at University of New Mexico.

Acknowledgments

The authors are grateful to Gordon Woodhull for his useful UI/UX design discussion. The authors are also grateful to the three reviewers for their useful suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations (ordered alphabetically) are used in this article:
ACCAAutomated Cloud Cover Assessment
ADLActive Deep Learning
AEZAgro-Ecological Zone
AIArtificial Intelligence
AIM-RRBAnnual Irrigation Maps—Republican River Basin
ALActive Learning
ALOSAdvanced Land Observing Satellite
ANNArtificial Neural Network
APEIAir Pollutant Emissions Inventory
APIApplication Program Interfaces
ASTERAdvanced Spaceborne Thermal Emission and Reflection Radiometer
AVHRRAdvanced Very High Resolution Radiometer
AWSAmazon Web Services
AW3D30ALOS World 3D—30 m
BCLLBiodiversity Characterization at Landscape Level
BELMANIP2Benchmark Land Multisite Analysis Intercomparison Products 2
BFASTBreaks for Additive Season and Trend
BGTBagging Trees
BRTBoosted Regression Tree
BSTBoosted Trees
BTBagged Trees
CARTClassification And Regression Tree
CCI-LCClimate Change Initiative Land Cover
CBERSChina–Brazil Earth Resources Satellite
CBIComposite Burn Index
CDLCropland Data Layer
CDOMChromorphic Dissolved Organic Matter
CGDCrowdsourced Geographic Data
CGLS-LC100Copernicus Global Land Cover Layer
CHELSAClimatologies at High Resolution for the Earth’s Land Surface Areas
Chl-aChlorophyll-a
ColabGoogle Colaboratory
CONUSCoterminous United States
CORINECoordination of Information on the Environment
CNBContinuous NaiveBayes
CNNConvolutional Neural Network
CVComputer Vision
CVAPSChange-Vector Analysis in Posterior Probability Space
CECommission Error
CZMILCoastal Zone Mapping and Imaging LiDAR
DEMDigital Elevation Model
DLDeep Learning
DMSP NTLDefense Meteorological Satellite Program Nighttime Lights
dNBRDifferenced Normalized Burn Index
DnCNNDenoising Convolutional Neural Network
dNDVIDifferenced Normalized Difference Vegetation Index
dNDWIDifferenced Normalized Difference Water Index
DNNDeep Neural Network
DOCDissolved Organic Carbon
DSMDigital Surface Model
dSWIRDifferenced Shortwave Infrared
DTDecision Tree
DTMDigital Terrain Model
ELRExtreme Learning Machine Regression
EOEarth Observation
ESAEuropean Space Agency
ETM+Enhanced Thematic Mapper Plus
EVIEnhanced Vegetation Index
FAOFood and Agriculture Organization
FCNFully Convolutional Network
FireCCI51MODIS Fire Version 5.1
FormaTrendForest Monitoring for Action—Trend
FPARFraction Photosynthetically Active Radiation
FROM-GLCFiner Resolution Observation and Monitoring of Global Land Cover
GBRTGradient Boosted Regression Trees
GCEV1Global Cropland Extent Version 1
GDEMGlobal Digital Elevation Map
GEEGoogle Earth Engine
GeoAIGeospatial Artificial Intelligence
GEOBIAGeographic Object-Based Image Analysis
GeoNEXGeostationary-NASA Earth Exchange
GFED4Global Fire Emissions Database 4
GFSADGlobal Food Security-Support Analysis Data
GHSLGlobal Human Settlement Layers
GISGeographic Information System(s)
GIScienceGeographic Information Science
GLCMGray-Level Co-occurrence Matrix
GLC 2000Global Land Cover 2000
GLDASGlobal Land Data Assimilation System
GLOFGlacial Lake Outburst Floods
GMMGaussian Mixture Model
GMTED2010Global Multi-Resolution Terrain Elevation Data 2020
gmoMaxEntMaximum Entropy Classifier
GPRGaussian Process Regression
GREONGreat Rivers Ecological Observation Network
GSWGlobal Surface Water
HABHarmful Algal Blooms
IKPamirIntersection Kernel Passive Aggressive Method for Information Retrieval
INPENational Institute for Space Research (Brazil)
IoUIntersection over Union
IRSIndian Remote Sensing
JRCJoint Research Centre
KNNK-Nearest Neighbor
LAILeaf Area Index
Landsat 8 OLIOperational Land Imager
LandTrendrLandsat-based Detection of Trends in Disturbance and Recovery
LiDARLight Detection and Ranging
LIPLake Ice Phenology
LSLTSLarge-Scale and Long Time Series
LSTMLong Short-Term Memory
LSWILand Surface Water Index
LULCLand Use and Land Cover
MAEMean Absolute Error
Markov-CAMarkov-based Cellular Automata
MERITMulti-Error Removed Improved-Terrain
MCD12C1MODIS Land Cover Type (5.5 km)
MCD12Q1MODIS Land Cover Type (500 m)
MCD15A3HMODIS Terra Aqua Leaf Area Index/FPAR
MCD43A1MODIS Bidirectional Reflectance Distribution Function (BRDF) Model Parameters
MCD43A4MODIS Nadir BRDF-Adjusted Reflectance (NBAR)
MCD64A1MODIS Burned Area Product
MDMinimum Distance
MDAMean Decrease in Accuracy
MIoUMean Intersection over Union
MIrAD-USMODIS Irrigated Agriculture
MLMachine Learning
MLPMulti-Layer Perceptron
MLRMultiple Linear Regression
MNDWIModified Normalized Difference Water Index
MODISModerate Resolution Imaging Spectroradiometer
MOD09A1MODIS Terra Surface Reflectance (500 m)
MOD09GQMODIS Terra Surface Reflectance (250 m)
MOD11A2MODIS Terra Land Surface Temperature and Emissivity
MOD13A2MODIS Terra Vegetation Indices (1 km)
MOD13Q1MODIS Terra Vegetation Indices (250 m)
MOD15A3MODIS Terra Leaf Area Index/FPAR
MOD44BMODIS Terra Vegetation Continuous Fields
MSCNNMultiscale Convolutional Neural Network
MSIMultispectral Instrument
MTBSMonitoring Trends in Burn Severity dataset
MuWI-RMulti-Spectral Water Index
MYD11A2MODIS Aqua Land Surface Temperature and Emissivity
NAIPNational Agriculture Imagery Program
MYD13MODIS Aqua Vegetation Indices
NASANational Aeronautics and Space Administration
NASSNational Agricultural Statistics Service
NANot Applicable
NBNaiveBayes
NDBINormalized Difference Built-up Index
NDVINormalized Difference Vegetation Index
NDWINormalized Difference Water Index
NEX NASA Earth Exchange
NGANational Geospatial-Intelligence Agency
NGTINormalized Difference Tillage Index
NFINational Forest Inventory
NICFINorway’s International Climate and Forest Initiative
NIRNear Infrared
NLCDNational Land Cover Dataset
NNNeural Network
NOAANational Oceanic and Atmospheric Administration
NSNot Specified
NWINational Wetland Inventory
OAOverall Accuracy
OEOmission Error
OLIOperation Land Imager
OSMOpenStreetMap
PAProducer’s Accuracy
PB Petabyte
PegasosPrimal Estimated sub-GrAdient SOlver for SVM
PRODESAmazon Deforestation Monitoring Project
PSNRPeak Signal-to-Noise Ratio
QA60Sentinel 2 Quality Assurance Bitmask Cloud Band
QRFQuantile Regression Forest
RBRRelativized Burn Ratio
RF Random Forest
RFVCRelative Fractional Vegetation Cover
RGBRed-Green-Blue
RHSegRecursive Hierarchical Segmentation
RMSERoot Mean Square Error
ROCReceiver Operator Curve
RRMSERelative Root Mean Square Error
RSRemote Sensing
RTCRadiometric Terrain Correction
RUESVMRandom Under-sampling Ensemble of Support Vector Machines
RVMRelevance Vector Machine
SAEStacked AutoEncoder
SARSynthetic Aperture Radar
SATVISoil Adjusted Total Vegetation Index
SAVISoil Adjusted Vegetation Index
SDSSatellite Derived Shoreline
SEN12MS-CRSentinel 1 and 2 Multi-Spectral Cloud Removal dataset
SNICSimple Non-Iterative Clustering
SPOT Satellite pour l’Observation de la Terre
SRTMShuttle Radar Topography Mission
SSIMStructural Similarity Index
SSSSea Surface Salinity
SSTSea Surface Temperature
Suomi-NPP NTLSuomi National Polar-orbiting Partnership Nighttime Lights
SVMSupport Vector Machine
SWIRShortwave Infrared
TBTerabyte
TIRThermal Infrared
TLTransfer Learning
TMThematic Mapper
TRMMTropical Rainfall Measuring Mission
UAUser’s Accuracy
UASUnoccupied Aircraft Systems
UN-GGIMUnited Nations Initiative on Global Geospatial Information Management
USDAUnited States Department of Agriculture
USGSUnited States Geological Survey
VHRVery High Resolution
VIIRS NTLVisible Infrared Imaging Radiometer Suite Nighttime Lights
WUDAPTWorld Urban Database Access and Portal Tools

Appendix A. The Accompanying Interactive Web App Tool for the Literature of GEE and AI

In Section 1.1 and Section 3.1, we provided a brief map and graphic summary of the 200 papers covered in this review. To allow readers to search for literature that is relevant to their research interests, get more useful and dynamic information and insights from the papers reviewed, we have developed an interactive web app called iLit4GEE-AI (https://geoair-lab.github.io/iLit4GEE-AI-WebApp/index.html (accessed on 1 May 2022)). On our site, you will find:
  • A brief web app demo video: the video link is accessible at the web app page (top-right corner);
  • Acronyms that are used in the data table of the web app, as well as explanations for each data field and chart (also in the top-right corner). A plan to continuously update and maintain the web app: To better serve the RS/GEE researcher and practitioner community, as well as AI engineers who would like to contribute to RS and GEE, we will continue to update the data to include new GEE + AI literature as it is published. Even after this paper is published, we hope this web app will serve as one place to keep track of a comprehensive and up-to-date list of GEE + AI literature. In the future, the data on the web app will be maintained and continually updated by the members of the GeoAIR Lab (Geospatial Artificial Intelligence Research and Visualization Laboratory). Our web app is data-driven and scalable (i.e., once data gets updated, the web app will automatically sync and update the visualization and filtering functions on the site).

Appendix B. Evaluation Metrics

For the most commonly used evaluation metrics in the context of combined GEE, AI, and RS literatures, see Appendix C of our recent paper [264] at https://doi.org/10.3390/s22062416 (accessed on 1 May 2022).

Appendix C. Textual Summaries for Advances in Applications

To make the main body of the paper concise, but to also provide a comprehensive summary of application domains that leverage GEE and AI, this appendix provides textual summaries for selected studies in each of the application areas provided in Section 3.2.1, Section 3.2.2, Section 3.2.3, Section 3.2.4, Section 3.2.5, Section 3.2.6, Section 3.2.7, Section 3.2.8, Section 3.2.9, Section 3.2.10, Section 3.2.11, Section 3.2.12, Section 3.2.13, Section 3.2.14, Section 3.2.15, Section 3.2.16, Section 3.2.17 and Section 3.2.18.

Appendix C.1. Textual Summaries for Crop Mapping

Landsat-8 and Sentinel-2 imagery were combined in [47] with elevation data to produce a crop map across continental Africa on the GEE platform. The crop extent map was produced by combining the output of a Recursive Hierarchical Segmentation (RHSeg) object-oriented segmentation with either a RF or SVM pixel-based classification to reduce “salt and pepper” noise from using pixel-based models alone. The final, open-source data product being compared to other commonly used crop maps. However, their method relies on optical imagery and obtaining cloud-free, continuous scenes for the entire African continent proved difficult. A two-step approach for crop identification in the central region of Ukraine was developed by the authors in [52] through exploiting intra-annual variation of temporal signatures of remotely sensed observations (Sentinel-1 and Landsat images) and prior knowledge of crop calendars. Landsat-based time-series metrics capturing within-season phenological variation were first preprocessed. The developmental stage of each crop was modeled by fitting harmonic function, which was then used for the automatic generation of training samples. Three classification methods (SVM, RF, and decision fusion) were used for the pixel-wise classification. The SVM classifier resulted in the lowest accuracy. The integration of multispectral and SAR data improved the classification accuracy. Large amounts of training points were collected in [67] from Google Earth imagery and analyzed Landsat and DEM data to create a cropland data layer across Europe, the Middle East, and Russia. Their results compared favorably to existing data products like the United Nations Food and Agriculture Organization (FAO) estimates while relying only on open-source data and releasing their code for the GEE platform. The authors were also able to distinguish between crop subtypes like agriculture and agroforestry, a common problem for many cropland data products. In addition, [67] showed that across regions, NDVI, NDWI, and slope were good predictors for various crop labels while blue and SWIR1 were not. While the authors achieved good results across a wide area, their processing pipeline and thus results relied on relatively cloud-free Landsat data. In the future, a harmonized Landsat-Sentinel data product would increase data availability and improve results further. Lastly, the authors noted that while the data gathering process was time- and resource-intensive, future projects that crowdsource or pool data products together would save time and effort.
Over a three-year time period, the authors in [75] were able to map paddy rice using Sentinel imagery by utilizing several different spectral indices and creating composites of different paddy rice growth periods. Their results were highly accurate in three separate areas. The authors shared their code on GEE, while also showing that their open-source analysis showed good agreement with maps previously produced by government agencies. However, the authors noted that their method was still subject to finding cloud-free optical RS imagery and/or finding adequate cloud masking algorithms. In [68], the authors proposed a paddy rice area extraction approach by using the combination of optical vegetation indices and SAR data. The Sentinel-1A SAR and the Sentinel-2 MSI Level-2A imagery were used to identify paddy rice. Three vegetation indices, namely NDVI, EVI, and land surface water index (LSWI), were estimated from optical bands. Two polarization bands from Sentinel SAR imagery were used as a supplement to overcome the cloud contamination problem. This approach was applied with RF algorithm for the Jianghan Plain in China as an experimental area. The authors in [71] thus used a U-Net to map sugarcane in Thailand but used a lightweight NN as an encoder for the DL model to reduce compute costs. They tested the network architecture using the RGB channels and pre-trained weights, RGB channels and randomly initialized weights, and then randomly initialized weights while using the RGB and NIR channels. Because DL models were not currently supported by GEE, the authors used Google Cloud, GEE, and the Google AI Platform together to preprocess their data and train their models. They showed that transfer learning works the best (i.e., the pre-trained weights resulted in the highest accuracy, F1-score, precision, and recall). The authors noted that their model did not take into account phenological information, which would have required changing the NN architecture, but that this was an area for future research using their method.
Shade-grown coffee landscapes are critical to biodiversity in the forested tropics, but mapping it is difficult because of mountainous terrain, cloud cover, and spectral similarities to more traditional forested landscapes. The authors in [50] used Landsat, precipitation, and DEM data to map shade-grown coffee in Nicaragua using a RF model. The authors reported high accuracy scores across different land class types (including shade-grown coffee), but also did a relative variable importance on what data contributed most to the RF model’s performance. More specifically, [50] performed an ablation study where they compared model performance based on increasing the number of features the model sees. They found that elevation was the most important factor, followed by the correlation between precipitation and NDVI, temperature, and slope, and seasonal information helped, as well. The authors noted that high-resolution data would help boost accuracy metrics in this classification task, but that increasing accuracy did not directly relate to increased socio-cultural or economic relationships in the region of study. The authors in [57] mapped corn at a 10-m resolution using multitemporal SAR and optical images. Certain metric composites were calculated, including monthly composites and percentile composites for Sentinel-1 images and percentile and interval mean composites for Sentinel-2 images, which were used as input to the RF algorithm on the GEE platform. To avoid speckle noise in the classification results, the pixel-based classification result was integrated with the object segmentation boundary completed in eCognition software to generate an object-based corn map according to crop intensity. In [78] the authors explored the differences between Landsat and Sentinel imagery for identifying cotton in China over the course of the plant’s life cycle. They found that Landsat data performed slightly better than Sentinel optical imagery, perhaps due to compute constraints on GEE: all of Sentinel’s input bands were not able to be used and vegetation indices were not able to be calculated, perhaps not taking advantage of Sentinel’s full potential. However, for the three years of RS data analyzed in the analysis, the authors only used Sentinel imagery for one year, making the results for the two datasets not directly comparable. Importantly, though, the authors examined the types of error that different input datasets made, finding for example that small dirt roads were more distinguishable from cotton fields in Sentinel imagery than in Landsat imagery.
The authors in [66], showed that by using climate and soil data with RS imagery on the GEE platform, it was possible to predict winter wheat yields 1–2 months ahead of harvesting in China. The authors utilized many ML algorithms available on GEE and compared specific time windows for phenological analysis and found that the closer the data came to planting and harvesting time, the better the ML models performed. Still, uncertainties from data resolution and human activity were present and affected the ability of models to predict with high accuracy across agricultural zones.
Crop maps are often created using vegetation indices and field observation data. The authors in [73] argued that this may lead to datasets and ML models that can only predict in specific areas and not generalize up to larger areas (i.e., regions or countries) or to other time periods in the same area. They further argued that what is needed is a more generalized method that can take in information like weather and climate data or DEM data and scale up to field-level predictions or larger. The authors compared a RF to three different DL models, a DNN, 1D CNN, and LSTM for predicting wheat yield in China. The DNN and RF performed the best over large areas, and the RF model often had the best performance. This is important to note because RFs often have comparable or better performance than DL models but use much less compute to train. However, this result could be due to the small size of the author’s dataset, meaning that the DL models were not able to train on enough data to merit their use. The authors ran a variable feature importance with the RF model across different years and months within their data and showed that elevation, latitude, soil, and vegetation indices were the most important input data while weather and climate data were the least important. The authors in [76] utilized GEE, Sentinel-2, and field data to train a RF to first estimate LAI and FPAR at a much finer spatial scale. Their LAI and FPAR maps matched well with field observations and, when spatially aggregated to match the resolution of the MODIS LAI/FPAR product, were in good agreement there, too. However, their method was based on an assumption about static land cover classes over a three-year time span, meaning that future work could potentially boost the accuracy of the method by checking to make sure this assumption was not in fact dynamic and changing over this period.
The authors in [49] produced annual irrigation maps (1999–2016) in the US Northern High Plains by combining all available Landsat satellite imagery with climate and soil covariables in a RF classification workflow. In total, 9 Landsat variables and 11 covariables were generated for use in the machine learning classification. To understand the relative contribution of input variables to classification accuracy, permutation tests and GINI Index metrics were run in R with an identically parameterized classifier since GEE did not output variable importance measures at the time of this study. Two novel indices that integrate plant greenness and moisture information ranked highest for both importance metrics used, warranting further study for use in irrigation classification in other agricultural regions. Statistical modeling suggested that precipitation and commodity price influenced irrigated extent through time. This method relied on manually produced training and test datasets well suited to identify areas where irrigation clearly enhances greenness. The authors in [51] implemented an automatic irrigation mapping procedure in GEE that uses surface reflectance satellite imagery from different sensors (Landsat 7/8, Sentinel-2, MODIS Terra and Aqua imagery, SRTM DEM). The approach integrated in a novel way unsupervised object-based image segmentation, unsupervised pixel-by-pixel classification, and multi-temporal image analysis to distinguish productive irrigated fields from non-productive and non-irrigated areas. The combination of these techniques enabled the detection of irrigated areas without requiring any reference cropland data for training of the mapping algorithm. The authors in [58] developed a rapid method to map Landsat-scale (30 m) irrigated croplands across the conterminous United States (CONUS). The method was based upon an automatic generation of training samples for most areas based on the assumptions that irrigated crops appear greener than non-irrigated crops and had limited water stress. Two intermediate irrigation maps were generated by segmenting Landsat-derived annual maximum greenness and Enhanced Vegetation Index (EVI) using county-level thresholds calibrated from an existing coarse resolution irrigation map. The random samples extracted from the training pool along with RS-derived features and climate variables were then used to train ecoregion-stratified RF classifiers for pixel-level classification. Evaluation of feature importance indicated that Landsat-derived features played the primary role in classification in relatively arid regions while climate variables were important in the more humid eastern states.
The authors in [46] compared several algorithms on the GEE platform, CART, IKPamir, logistic regression, a MLP, NB, RF, and an SVM, for crop-type classification in Ukraine. The authors also used an ensemble NN but had to move off the GEE platform since NNs were not currently supported. The ensemble NN performed the best out of all the models, although the authors noted that the SVM algorithms were not working on the GEE platform. To that end, the authors found that in general the algorithms on GEE were not very flexible, and some preprocessing steps like dealing with missing data were difficult to implement, so all preprocessing steps took place outside of the GEE platform. The authors found that atmospherically corrected Landsat data boosted model performance more than when models were fed Landsat composites data. In the future, [46] said that optical imagery in conjunction with SAR data or combining data from multiple RS platforms would help boost performance. The authors in [72] combined optical and SAR Sentinel data to create higher-resolution maps capable of displaying information on less commonly mapped non-staple crops in the US. First, the authors denoised their SAR data with a CNN, and then fused this with optical RS imagery. These data were then used to train a RF, as well as three separate DL models: SegNet, U-Net, and a 3D U-Net. The authors showed that fusing optical and SAR data worked better than using optical data alone, that using denoised SAR data in the fusion process led to higher accuracy scores, and that the best model was the 3D U-Net model trained on the optical-denoised SAR fused data. However, an interesting finding was that the RF performed best when using only optical information alone. The authors trained their DL models offline as NNs were not currently supported on GEE. The authors mentioned that the extremely high accuracy rates of the 3D U-Net model might indicate overfitting, and that when taking into account required training times, the RF model performed well while using the least amount of compute across all datasets. Lastly, this paper used semantic segmentation, but future research in the field should investigate instance segmentation. Optical imagery is used in many EO analyses because it is comparable to how humans see; we can easily understand it. However, it is often blocked by clouds limiting its utility. SAR imagery works day or night regardless of cloud cover, so [74] used it for crop classification while testing input composite image length and ML classification performance. The authors compared an object-oriented classification method combining the SNIC algorithm with a RF with that of a pixel-based method of just the RF by itself. The authors found that adding SNIC to their processing routines smooths the data before it was fed into the RF model, ultimately boosting accuracy rates more than 10% in their study. They also showed that shorter time periods were more useful for making composites for classification, most likely because plants look very different over the course of a growing season. However, the authors noted that their method worked better for larger cropland areas and might not generalize to other areas with smaller field sizes. The authors in [56] compared the performance of an ANN to CART, RF, and SVM models on GEE for sugarcane mapping in China using Sentinel-2 imagery. The authors identified that the SVM performed the best, but then went on to show which type of errors each model makes. For example: the ANN tended to overfit the data and give too much preference to the sugarcane class, while tree-based models confuse the forest and water classes. The authors then incorporated NDVI information into the SVM to show how the model did with this extra information. To improve the results in this study, the authors identified using SAR data would be helpful in removing the impact shadows have on classification errors. The authors in [19] created an open-source map for several West African countries using a RF model trained on Landsat data. Their map was moderately more accurate than other maps produced for the region, though going further and demonstrating the difference between feature importances based on wet and dry seasons for their countries of analysis. The authors used GEE for processing data but needed to train their model offline because the GEE RF model implementation was not flexible enough for their analysis. Papers like this one show a trend that GEE is facilitating in that researchers now have freely available compute and are moving away from local, small-scale classifications and towards regional, national, and even global classification tasks.
The authors in [48] developed and implemented an automated cropland mapping algorithm (ACMA) using MODIS 250-m 16-day NDVI time-series data. A web-based in situ reference dataset repository was first developed to collect ground data through field visits, very high spatial resolution data (sub-meter to 5-m), and through community by crowdsourcing. Comprehensive knowledge base was then established for Africa using the web repository. Second, clustered classes from each of the eight agro-ecological zones generated using k-means algorithm were grouped together through quantitative spectral matching techniques (QSMTs) and the group of similar cluster classes was matched with the ideal spectra to identify and label classes. This process produced a reference cropland layer for the year 2014 (RCL2014) for the entire African continent consisting of five crop products (cropland extent and areas; irrigated versus rainfed croplands; cropping intensities; crop type and/or dominance; croplands versus cropland fallows). Third, decision tree (DT) algorithms were established for the eight agro-ecological zones (AEZs) based on the RCL2014 knowledge base which was subsequently composed into an ACMA applicable for the entire African continent. Finally, the ACMA algorithm was deployed on GEE and applied on MODIS data from 2003 through 2014 to produce annual ACMA generated cropland layers. The Agriculture and Agri-Food Canada (AAFC) has been responsible for producing Annual Space-Based Crop Inventory (ACI) maps for Canada. The 30-m ACI maps were created by applying a decision tree method to optical (e.g., Landsat) and SAR data (e.g., Radarsat-2). With the goal of producing ACI maps more effectively and efficiently, the authors in [69] developed an object-based method (i.e., simple Non-Iterative Clustering (SNIC)) for producing ACI maps based on Sentinel-1 SAR data and Sentinel-2 optional data. The GEE platform and ANN were used to produce an ACI map for 2018. The OA was reported at 77%. Even though the OA was slightly lower than that of the AAFC’ ACI maps, the authors argued that their proposed GEE method is promising due to its superior computational efficiency.

Appendix C.2. Textual Summaries for Land Cover Classification

The authors in [80] used a RF model to determine land-use classes such as vegetation, croplands, and urban areas from Landsat imagery in Zambia. The authors noted that the GEE platform allowed their workflow to be more flexible, leading to this type- and place-specific land cover application. However, the authors had to leave the GEE platform to create verification points for the ML training process. The authors compared their maps to other commonly used land cover maps like Globecover, GLC 2000, and GFSAD and noted the similarities and differences. The authors in [81] presented an approach to quantify continental land cover and impervious surface changes over continental Africa for 2000–2015 using Landsat images and a RF classifier on GEE. Landsat spectral bands, NDVI, NDWI and night-time light served as predictor variables. This study relied on visual inspection of high-resolution imagery to produce training data. The authors in [82] proposed a land-use/land-cover type discrimination method based on a CART, applied change-vector analysis in posterior probability space (CVAPS) and the best histogram maximum entropy method for change detection, and further improved the accuracy of the land-updating results in combination with NDVI timing analysis. Selecting western China as the research area and using GEE’s JavaScript API interface, they obtained a 2014 land map based on the ESA GlobCover 2009 dataset. A total of 1000 verification points were selected for visual interpretation in Google Earth. A program with Node.js and JavaScript was also developed to randomly generate validation points and an auxiliary rectangle. The results of the transfer error matrix analysis showed that the overall accuracy of the land map from the proposed CART-CVAPS-NDVI method was 78.6–88.2%. The authors in [93] designed such a workflow on GEE for Iran using Sentinel-1 and -2 data and a RF model and SNIC. With the ground-truth training samples available, the authors used SNIC to segment land-use classes into objects while the RF model classifies them on the pixel level. Afterwards, visual assessment was used to verify majority voting between the two classifiers for 13 different land-use classes. While there was some confusion between similar classes (e.g., water and marshland), this analysis resulted in a much higher resolution, much more accurate land-use map of Iran than the 2016 map. However, the authors noted that in some ways GEE limited their study: for example, SNIC was the only segmentation algorithm on GEE. Additionally, because of computational limits on the platform, only so many training samples can be included, and input features have to be chosen carefully before feeding them to a ML model.
The authors in [83] utilized Landsat images available through GEE to map annual land-use changes in China’s poverty-stricken areas. Landsat 8 images from 2013–2018 were preprocessed and then used to compute spectral indices (e.g., NDVI, Normalized Difference Built-up Index (NDBI), MNDWI). Night-time data were also included to improve the extraction of built-up areas. A RF classifier was then trained and used to perform land-use classification in poverty areas. The results revealed significant variations in land-use change among the poverty areas in China. Some poverty areas had more intense construction activities than others. The authors mentioned some limitations of GEE, for example, the low computational efficiency of vector data. Uploading data to GEE or exporting data from GEE can be time-consuming. The authors in [87] set out to create an open-source land cover mapping processing pipeline using GEE. They argued that land cover maps specifically can help countries properly plan for sustainable levels of food production, but that many developing countries did not have the financial or compute resources to monitor land classes in real time. Using SVM and bagged trees (BT) models, the authors predicted urban, agriculture, tree, vegetation, water, and barren land-use types in Lesotho. However, the authors had low accuracy rates across most classes. During the ML training process, the authors ultimately had to leave the GEE platform because of “out-of-computation” time errors in the code editor.
The authors in [88] collected a multi-seasonal sample set for global land cover mapping in 2015 from Landsat 8 images. The concept of “stable classification” was used to approximately determine how much reduction in training sample and how much land cover change or image interpretation errors can be acceptable. Using a RF algorithm with 200 trees, a numerical experiment showed that less than 1% overall accuracy was lost when less than 40% of the total global training sample set were used, when 20% of the global training sample points were in error, or even the land cover changed by 20%. With this knowledge in mind, the authors transferred their 2015 global training sample set at 30-m resolution to 10-m resolution Sentinel-2 images acquired in 2017 and produced a 10-m resolution global land cover map.
Feature engineering can lead to higher accuracies in EO analyses when using ML. However, it is difficult to create features that you know will be useful to a model beforehand, even with expert domain knowledge in a given area. Thus, the authors in [105] tested the difference in model performance when using single image mosaics, time series RS imagery, statistical features (median, standard deviation), band ratios, or all of the features listed. They test this by training a RF model on each subset of data to create LULC maps in Brazil. The authors found that inputting a time series of the data was the most accurate, more accurate even than when using all of the data. This research showed that more data was not always better and that feature engineering did not always lead to better model performance despite the increased compute cost. The authors in [92] trained several different ML models available on GEE with different combinations of input data to determine which were the most important in determining land-use types in Golden Gate Highland Park in China. The authors compared combinations of different band ratios, elevation, aspect, and slope data and found that including SWIR data in their analysis reduced classification errors in areas with sparse vegetation. Different models were able to capture different land-use types. For example, SVMs better distinguished between urban and agricultural lands, while the RF model used was better at identifying forested landscapes, suggesting that different types of models may be suitable for different tasks. Even though OA rates were high for the best models, most models still had issues telling bare or rocky landscapes apart from drier vegetation. The authors in [95] set out to compare the contribution of SAR data and different indices (NDVI, EVI, SAVI, NDWI) derived from optical data on overall classifier performance. They found that including SAR data moderately improved performance, while only NDWI gave the ML model a significant performance enhancement. The authors still struggled to classify vegetation subtypes like shrubs, grasslands, and aquatic vegetation, but their accuracy rates matched those of common LULC maps like Finer Resolution Observation and Monitoring of Global Land Cover 30 m (FROM-GLC30) and GlobeLand30. This work contributed to a growing body of literature attempting to empirically show which input data types can help identify which LULC classes using RS and ML. This researchers in [98] generated a land cover map of the whole African continent at 10 m resolution, using multiple data sources including Sentinel-2, Landsat-8, Global Human Settlement Layer (GHSL), Night Time Light (NTL) Data, SRTM, and MODIS Land Surface Temperature (LST). Different combinations of data sources were tried to determine the best data input configurations. It was found there was always an increase of accuracy when new data were introduced. They also conducted an investigation of the importance of individual features derived from a RF classifier. A transferability analysis experiment was designed to study the influence of sampling strategies on the land cover mapping performance. It was suggested that training samples of natural land cover classes should be collected from areas covering each main Köppen climate zone for African land cover mapping and other similar tasks. Different data sampling strategies and their effects on how different ML classifiers performed on LULC tasks were compared in [101]. The authors trained a Relevance Vector Machine (RVM) offline in addition to the CART, RF, and SVM models on GEE. For their particular LULC application, stratified proportional random sampling led to higher overall accuracy scores than stratified equal random sampling or stratified systematic sampling and the RF model performed better than the CART, RVM, and SVM. However, their study lacked ground truth data, so the authors needed to use existing land cover maps for data collection purposes. As a result, even the best model (RF) had trouble recognizing classes without many samples leading to low class accuracies.
The authors in [96] proposed a hybrid data balancing method, called the Partial Random Over-Sampling and Random Under-Sampling (PROSRUS), to resolve the class imbalance issue. PROSRUS used a partial balancing approach with hundreds of fractions for majority and minority classes to balance datasets. The reference samples were generated using visual interpretation of very high spatial resolution images of Google Earth. It was observed that PROSRUS had better performance than several other balancing methods and increased the accuracy of minority classes without a reduction in overall classification accuracy. It was noted though that every dataset requires a specific balancing ratio to obtain the optimal result because the imbalance ratios and complexity levels are different for different datasets. It also showed that topographic data including elevation, slope, and aspect had higher impacts than spectral indices in improving the accuracy of MLC maps. The authors in [97] proposed a new method by integrating random under-sampling of majority classes and an ensemble of Support Vector Machines, namely Random Under-sampling Ensemble of Support Vector Machines (RUESVMs). Specifically, the RUESVMs method created an ensemble of SVM classifiers that each was trained by a randomly under sampled subset of the original imbalanced data based on the defined fractions, and finally combined the output of the SVM classifiers using majority voting. The performance of RUESVMs for LC classification was evaluated in GEE over two case studies using Sentinel-2 time-series data and five well-known spectral indices. The results showed that the RUESVMs method considerably outperforms the other benchmarks methods. It not only increased the accuracy of minority classes, but also increased the accuracy of majority classes. Aiming to resolve the problem of lack of training samples for dynamic global land cover mapping efforts, [99] developed an automatic training sample migration method based on the first all-season sample set (FAST) in 2015 (Li et al., 2017) and all available Landsat 5 TM archives in GEE. Spectral similarity and spectral distance measure were calculated between the reference spectra and target spectra. Threshold values were determined to indicate a land cover change in a pixel. EO analyses making use of ML are often limited by the number of labeled training samples available in a given domain. The authors in [104] created a training set by pairing Landsat imagery with a MODIS LULC map as labels. This allowed them to train CART and RF classifiers in both Australia and the United States, though their results indicated that, because of their small dataset, both models were overfitting on the training set compared to the test set. While determining ecosystem service values is complicated (many disciplines, many opinions…), the authors in [94] used GEE to illustrate a processing workflow for how LULC classes can be used to compute more complex ecosystem service values. Their open-source code and ecosystem model analyzed both optical RS imagery and DEM data. However, GEE did not support historical imagery, meaning that all the data the authors wanted to use were not available on the platform.

Appendix C.3. Textual Summaries for Forest and Deforestation Monitoring

The authors in [120] analyzed Sentinel-2 data and trained several different ML classifiers to distinguish between four different forest types in Italy during both summer and winter seasons. The authors compared combinations of the visible and infrared bands, vegetation indices, DEM data, and unsupervised classification output as input to CART, RF, and SVM models to see what effect different data sources had on model performance. They found that the best performing model was a RF trained on all of the input data, though accuracy rates varied across different classes. The authors completed the entire analysis completely within the GEE platform, allowing people regardless of programming skill and available compute to rerun their analysis. However, the authors noted that this effort also meant not being able to use third-party libraries for data processing and analysis like the Python API for GEE. To create a forest-type map in India using RS imagery and ML, [123] predicted for evergreen and deciduous forest types, as well as “non-forest” classes. The authors created NDVI signatures based on Landsat imagery and fed this information to a RF. For several classes, the authors achieved low accuracy rates. However, they achieved higher accuracy than the current MODIS maps used for forest cover, yet also showed where their predictions matched those of the MODIS maps. Analyses like this one contribute to a growing body of literature that show where current land maps need improvement and serve as a call to update land-use maps to a higher resolution. The authors made their code freely available both on GEE and GitHub so that their analysis can be rerun and improved upon. To classify tree species across a large area in China while fitting within compute restraints, [121] trained a RF on optical and SAR imagery, DEM data, and field observations on the GEE platform. Across seven different tree species, the authors achieved an OA rate of 77.5%, but noted that including climate and soil data in addition to incorporating ecological models would help boost accuracy rates. The authors in [118] used GEE to map mangrove extent in Indonesia. The authors used a SVM trained on Landsat data while also predicting for water and cloud LULC classes. However, the authors had a low accuracy rate for identifying mangroves, the class that they were actually trying to predict for. Most issues classification errors were related to cloud and hill shadows and identifying mangroves farther away from the coastline. Further, the authors used visual assessment as their only accuracy metric. While representing classification accuracy visually is certainly important, more quantitative measures are needed in order to properly compare results from different studies.
The authors in [109] developed and tested a participatory mapping methodology to map the extent and species composition of forest plantations in the Southern Highlands area of Tanzania. A large set of reference data was collected in a two-week participatory GIS campaign in which local experts interpreted very high-resolution satellite images in Google Earth through the Collect Earth tool in the open-source Open Foris suite. Three different classifiers (CART, SVM, and RF) were tested to classify a multi-sensor image stack of Landsat 8 (2013–2015), Sentinel-2 (2015–2016), Sentinel-1 (2015), and SRTM derived elevation and slope data layers. A RF with 150 trees was selected for creation of the forest plantation area and planted species distribution maps. One of the main challenges in participatory reference data collection was the quality and consistency of the collected samples. The study found that sufficient training prior to the data collection was crucial for the interpretation success. The interpretation agreement generally declines when details are increased from forest plantation coverage to specific plantation quality attributes. The authors stated that at least in complex environments, it may not be realistic to expect good accuracy on detailed level information such as tree species or age derived from visual interpretation of optical data. To explore how GEE could be used to create an open-source processing pipeline for deforestation mapping in Liberia and Gabon, the authors in [116] used two different RF models to create data masks and then predictions for various land types there. However, the output classification maps were then shown to local experts to correct, boosting the accuracy of the final accuracy rates. The authors showed that their method was more accurate than other efforts to classify deforestation rates in these two countries, though there were still some model misclassifications between classes due to not enough ground-truth data. This presents a future area of research, where ML/DL/CV models are used to generate first-order maps that are then verified by experts in that field (i.e., expert systems). Building land classification maps in this way saves experts’ time but also keeps humans in-the-loop where human values and knowledge can still be represented and included.
The authors in [125] developed a method for monitoring tropical forest loss and recovery based on Landsat data. First, the authors used a RF model to map canopy cover through time as a proxy for forest degradation and then applied the LandTrendr algorithm to detect changes over a 19-year period. They found that the most valuable variables for predicting tree canopy decline and regrowth was shortwave surface reflectance data and an index related to plant moisture. While Landsat data were useful for tracking changes in forest distribution through time, the authors noted that more very high-resolution products for ground-truthing would benefit their analysis, as would the use of SAR data since tropical forests were covered by clouds a large portion of the time. Using SAR data as input and high-resolution optical data as validation data, the authors in [124] trained a U-Net on Google Cloud to create monthly forest loss maps. They compared this model with a RF trained on GEE while testing both models in Brazil and the United States where both logging activity and wildfires were prevalent. They showed that the U-Net model outperformed the RF in most cases, though the RF model still achieved high accuracy rates. However, when the U-Net model was trained on data from one region and then applied to the other, it did not perform well. Thus, the CNN is not generalizable and would need to be re-trained before being used in additional locations. In [117], the authors showed how GEE can be used to overcome data storage and compute needs and analyze about 20 years’ worth of Landsat data to determine forest cover changes. The authors used a RF model to show where deforestation has continued versus where forests have partly recovered. Then, they fed the predictions of their RF model to an ANN-based forest projection model to simulate forest loss up through 2028. The authors noted that because of a lack of availability in reference high-resolution RS imagery, certain years in their analysis could not be validated. In [122], the authors used a RF for initial LULC classification, then used a MLP to simulate possible deforestation scenarios into the future. Finally, the authors used a Markov-based Cellular Automata (Markov-CA) model to analyze the probability of transition scenarios. Their results verified previous research findings and their maps showed good agreement to current efforts to map forest change like those of the Amazon Deforestation Monitoring Project (PRODES) program. What’s more, the authors identified several key factors indicating high rates of deforestation, such as proximity to roads and urban centers.
The authors in [107] demonstrated a low-cost method for monitoring industrial oil palm plantations in Indonesia using Landsat 8 imagery that allowed them to distinguish between oil palm (immature oil palm, mature oil palm), forest, clouds, and water classes using the CART, RF, and MD algorithms. Their results demonstrated that CART and RF had higher OA and Kappa coefficients than the MD algorithm. Critically, the authors compared model accuracy based on different combinations of spectral bands (particularly, RGB, SWIR, TIR, and NIR), including all bands, to determine which would help specifically with oil palm plantation monitoring. The authors did not use SAR for this analysis but noted that in future work the combination of optical and SAR imagery might improve results. They also pointed out a need for more and better algorithms on the GEE platform. Lastly, the authors noted that there was a need to make intuitive, easy-to-use tools for specific tasks that incorporated input from the public and other stakeholders like NGOs and government agencies. The authors in [115] compared several ML models to map oil palm using 30 m Landsat 8 imagery in Malaysia. The authors found that tree-based models (e.g., RF, CART) worked better than a SVM for the task and were able to classify large areas with high accuracy. Even so, classification errors were traced to the relatively coarse resolution of Landsat data. The authors noted that higher resolution platforms like Sentinel and the ability in the future to use DL methods on GEE will lead to higher performance. As a highly forested landscape, southern Belize has been experiencing deforestation due to agricultural expansion. In [108], the authors utilized Landsat 8 imagery on the GEE to perform a supervised classification. Subsequently, they built a MLP model to predict future deforestation patterns and magnitude based on the drivers of past deforestation patterns in the region. The projections indicated that the forest cover in southern Belize will decrease from 75.0% in 2016 to 71.9% in 2016. The deforestation prediction maps can provide useful information for stakeholders on how to better allocate resources to protect forested landscape and improve the biodiversity of ecosystems.
The authors in [15] addressed this issue by mapping disturbed forest areas in Brazil using 27 years of Landsat surface reflectance imagery. By separating out old-growth forests from degraded forests and deforested regions, the authors were able to produce an intact-disturbed forest map to track degraded forests. The whole processing pipeline was done on GEE using a RF. They integrated single date features with temporal characteristics from six time-series trajectories, in particular, two Landsat shortwave infrared bands and four vegetation indices. The authors run a relative variable importance analysis for each ecoregion. The authors were able to show that past maps were a bit outdated due to their inability to separate forest classes by intact and degraded, although their results vary from ecoregion to ecoregion.

Appendix C.4. Textual Summaries for Vegetation Mapping

The authors in [129] developed and tested an approach to automate the mapping and quantification of vegetation cover and biomass using Landsat 7 and Landsat 8 imagery across the grazing season (i.e., changing phenological conditions). Using a best-subset regression modeling approach, they found that the best predictor variables vary by season, corresponding to vegetation phenology. It was found that NDVI, a rough proxy of vegetation production which is widely used for rangeland monitoring tools, is less accurate when vegetation contains high proportions of standing dead or senescent vegetation. Different NDVI thresholds were determined to guide season-specific model application. They showed that using NDVI to select from seasonal models for application increased accuracy when modeling vegetation amounts at varying growth stages compared to the single variable all-year normalized difference tillage index (NDTI) models. In [130], the authors utilized the historical Landsat satellite record, gridded meteorology, abiotic land surface data, and over 30,000 field plots within a RF model to predict per-pixel percent cover of annual forbs and grasses, perennial forbs and grasses, shrubs, and bare ground over the western United States from 1984 to 2017, at approximately 30 m resolution. The R ranger package, which provides diagnostic tools and variable importance ranking, was first used to define RF model parameters and select the optimal input variables. The RF model was then implemented in GEE to predict percent cover using the top 40 most important variables per class. With continuous rather than categorical estimates of vegetation cover, it is possible to assess changes in functional group composition, transitions to new vegetation states, efficacy of vegetation treatments, and vegetation dynamics pre- and post-disturbance across space and time. Using climate and field data alongside Landsat imagery and MODIS land-use maps, ML models used in [21] were able to predict for several important rangeland indicators like plant height, total vegetation and rock cover, as well as bare soil. After running a relative variable importance analysis, the authors found that topographic variables were less important to the best performing model (RF), while the MODIS land map input data were the driving factor in model performance. However, the authors noted that because GEE did not have hyperparameter tuning for ML models, they trained some offline. Additionally, while this analysis used RS imagery and current land-use maps to make predictions, it was still reliant on field data. Because of a lack of observations during the winter for western US rangelands, the authors cautioned that before their model was used for making predictions during that season, more field observations would need to be collected first to tune their model.
The authors in [136] used a specific invasive species in China as a case study for developing a ML pipeline that takes into account both cloud cover and phenological information. They compared the ability of a stacked autoencoder and a SVM to classify vegetation types. While the SVM was trained on GEE, the DL model had to be trained offline as the platform did not currently support DL models. The authors found that the DL model performed better than the SVM and that both models performed better with phenological information. Even so, the authors noted that the 16-day return time of Landsat imagery was a limiting factor in their analysis and that further work could be done to apply their method on Sentinel imagery. Importantly, the authors in [136] called on other authors to upload the ground observation data and final maps to GEE so that authors can replicate studies and compare results. Increasingly, researchers are not only producing maps but comparing them to current data products and seeing how they differ. In order to produce a map of this invasive species, [139] collected and processed field data in addition to UAS imagery and optical RS data from several different platforms. The authors trained a RF model for classification purposes and while the data processing was done in GEE, all the ML portion was done outside of the platform. By using a RF, the authors were able to show exactly how the model was making decisions, distinguishing between mud flats and water and different coastal grass species. While the authors were able to achieve high accuracy rates, issues related to cloud masking, not incorporating phenological information, and challenges in identifying submerged grasses in tidal areas led to some model uncertainty.
In order to clarify what changes have been happening there, over three decades worth of Landsat imagery was used in [135] to determine which areas have experienced vegetation change. Two RF models were used on the GEE platform. The first classified land-use types and assessed the stability of predictions for those classes over half the total time period in question. The second was used to perform the overall classification, and this two-part process improved the accuracy by 4% (87%, up from 83% without assessing the stability of pixel classifications first). The main limitation was confusing class types like grassland, planted pasture, and savanna, though in the future radar and LiDAR data could help distinguish similar classes and boost OA. The resulting maps are freely available through the MapBiomass platform. The authors in [138] used an adaptive stacking algorithm to train a ML classifier on optical, SAR, and DEM data to identify wetland vegetation. Adaptive stacking is using one ML classifier to identify the optimal combination of ensemble classifiers and hyperparameters to be used for a given task. In this case, the authors used a RF model to determine the best combination of the CART, MD, NB, RF, and SVM classifiers on GEE. The authors found that the adaptive stacking method was much more accurate than the RF and SVM models alone. The resulting classification map was then combined with a trend analysis performed by the LandTrendr algorithm, which allowed them to identify wetland vegetation distribution as it is now and also how it has changed over time. Additionally, [138] tested their workflow on different subsets of input data and showed that adding more data helped the adaptive stacking algorithm learn better (in fact, the best combination of input data was all of the data). The authors noted that forest and reed classes were not identified well with their adaptive stacking algorithm, and that the LandTrendr algorithm will most likely need to be re-tuned in different environments.
Bathymetry and RS data were combined in [127] to create a processing and analysis pipeline for large scale seagrass habitat monitoring in Greece using GEE. While the authors compared CART, RF, and SVM models on the GEE platform and how they performed on open-source datasets, they validated the models on unpublished data, which made it difficult to replicate their results. A key limitation to this processing workflow is the lack of in-situ validation data. Thus, their preprocessing pipeline depends on creating a data mask for labels using a ML model, which is then fed to ML models as input data. If there is uncertainty or errors in the first output data layer, these errors would persist in the secondary classification step. Their reported OA is 72% and the authors suggested more seagrass datasets for performance improvement. A CNN–LSTM hybrid model was used in [132] to identify grassland types in Sentinel-2 imagery in the United States. The authors collected ground-truth field data for their experiment, and with the help of GEE for preprocessing and Google Colab for NN training, they received an almost 7% accuracy boost for identifying a type of grass (98.8%, up from 92%). However, the authors’ dataset was very small (13 Sentinel-2 images in total, 6 images in 2016 and 7 in 2017, as the time range corresponds to their field surveys years), so it was uncertain how this model would generalize to other regions in the same state or in different states altogether.
The authors in [43] compared the performance of a RF model with feature engineering to a LSTM and U-Net NN models without feature engineering for identifying pasturelands in Brazil. The RF model was trained on GEE while the NNs had to be trained offline as GEE did not currently support DL models. The authors crowdsourced the creation of a LULC dataset for Brazil using PlanetScope imagery to domain experts, ensuring that the labels for the input data were accurate. These LULC classes contained important pastureland subtypes in addition to savannah, forest, built-up areas, and water. U-Net had the highest generalization across both the validation and testing sets, maintaining high accuracy rates while the LSTM and RF model underfit the test set. To illustrate the tradeoffs between ML and DL models, the authors included run and inference times. The RF model was able to complete training and prediction in 3 h. The LSTM took 30 min to train but 23 h to predict on the test set, while the U-Net took 24 h to train but 1.2 h at inference time. The authors in [126] used GEE to compare how well several ML classifiers compare to index-based methods like NDVI. Using over 40 years of optical Landsat imagery, the authors were able to map vegetation loss with high accuracy matching that of a current government vegetation monitoring program (though their process relies only on cloud computing and freely available data) in Australia. However, different amounts of rainfall affected their results because models were not able to fully recognize vegetation in varying greening and drying patterns. Future analyses should attempt to collect more and higher resolution data to improve model performance. The authors in [140] argued that phenology information in RS time series can better capture tidal flat wetland vegetation and so compared phenology information to statistical (min, max, median) and temporal features (quartile ranges). They then fed this data into a RF while analyzing its effect on model performance during different periods of time (all data, green and senescence seasons) for wetland vegetation classification. The authors showed that the phenological information was the most important input feature to the RF, while combining all three sets of features led to the highest accuracy. Additionally, the model performed best when predicting over both the green and senescence periods. To explore how plant functional types can be derived directly from RS information, [137] trained a RF model on field, DEM, MODIS, and climate data. Their method was able to distinguish between moist and dry deciduous tree types with a high degree of accuracy, which could lead to better estimates of carbon, water, and energy fluxes. Still, the authors struggled to identify shrubs, grasses, and crops, and built-up areas.
The authors in [141] implemented just such a model that has been optimized for green LAI in RS imagery but do so in a way that is optimized for GEE. First, they created the model so that it can run on vector or tensor time series imagery. Then, the authors used AL for feature reduction so that the model only learned on important data while creating a model that can run within GEE’s memory confines. This GPR model was then used to gap-fill RS imagery focused on LAI, meaning the model was able to “see” through clouded optical imagery. More work like this should be done, either in creating new models to upload to the cloud that other researchers can use or optimizing these models so that they are memory efficient. The authors mention that better GEE code documentation and error messages could help future researchers interested in developing custom ML models for the platform.

Appendix C.5. Textual Summaries for Water Mapping and Water Quality Monitoring

In [32], the authors created a web portal using GEE as a backend alongside an expert system to identify bodies of water in Landsat imagery. Being able to visualize global trends in surface water allowed the authors to identify trends such as all continents gaining surface water, although this varies from region to region. While small bodies of water (30 m × 30 m or smaller) were not able to be mapped using the expert system, the process of mapping global surface water was sped up by the use of GEE compute resources. The authors noted that some regions had more accurate water maps because of the length of the observation record. In [142], the authors used all available Landsat images to study surface water dynamics in Oklahoma from 1984 to 2015. About 16,000 Landsat scenes were preprocessed using GEE. Subsequently, they computed spectral indices (e.g., MNDWI, NDVI, and EVI) and performed conditional operations to extract surface water areas. Four surface water products were created, including the maximum, year-long, seasonal, and average surface water extents. The results showed that both the number of surface water bodies and surface water areas had been decreasing from 1984 through 2015. Significant inter-annual variations in the number of surface water bodies and surface water areas were found. They also found that both the number of surface water bodies and surface water areas had a positive relationship with precipitation and a negative relationship with temperature.
The authors in [150] analyzed to what degree different preprocessing steps affect the output water maps using both SAR and DEM data and two variations of Otsu’s thresholding algorithm. They showed that SAR data included radiometric terrain correction (RTC) as a preprocessing step yielded more accurate results and that Bmax Otsu thresholding was more stable to different inputs than Edge Otsu. However, their analysis was limited in time and space, so more work needed to be done to test their results in different locations and varying terrain types at different times. In [143], the authors used Landsat 8 images available on GEE to map glacial lakes in the Tibet Plateau region. About 3580 Landsat scenes acquired in 2015 were preprocessed. After that, the MNDWI algorithm was applied to each image to extract glacial lakes with thresholding techniques. The initial results were then exported from GEE for further processing. They also analyzed the various characteristics of glacial lakes, including size classes, elevation, and climate forcing. The results revealed that climate warming played a major role in glacial lake changes. The authors in [151] compared the performance of MNDWI and a RF to that of a multi-scale CNN (MSCNN) and showed that the DL method was the most accurate (with less false classifications) for identifying urban water resources in several Chinese cities. However, the authors took a novel approach in avoiding the lack of DL methods available on GEE: they trained the CNN locally, and then uploaded the weight matrix to GEE. They then implemented the rest of the CNNs features (convolutions, etc) directly in GEE, effectively allowing the authors to run DL inference on the platform. Still, the MSCNN model had issues classifying small/thin water bodies and water scenes with mixed pixel classes. One way to make the data labeling process less time- and resource-intensive for DL was illustrated in [156], where the authors used current water maps and a segmentation algorithm to automatically collect data labels from Sentinel-1 imagery. This data were then used to train variations of U-Net in an offline environment. Due to computational constraints, the authors were not able to compare their model to more traditional ML models like a RF. Even with their automated data labeling pipeline, the authors noted that their study lacked sufficient data to adapt their method to more than one country and manual validation was still necessary to validate the model post-prediction.
Optical imagery used in surface water mapping analyses is often occluded by clouds, and many common methods used to map surface water confuse snow, ice, rock, and shadows as water. DeepWaterMapv2 was released in [147] to address these false positive misclassifications. The authors used Landsat imagery from GEE to train their NN architecture to identify bodies of water across different terrain types and in different weather conditions. However, due to the compute constraints and lack of NN models on GEE, the authors moved the data offline during the training process. The authors designed the network to work with many different satellite platforms as long as they have a set group of input bands. The authors in [157] used masking, filtering, and segmentation algorithms to identify bodies of water in Sri Lanka in complex, mountainous environments. They showed that their model performs well even in the presence of shadow or soil and does so much better than other common index-based methods like NDWI, MNDWI, or multi-spectral water index (MuWI-R). To explore the potential to distinguish between surface water body subtypes, [158] used slope, shape, and phenology, and flooding information as input to a RF model to predict for lakes, reservoirs, rivers, wetlands, rice fields, and agricultural ponds. Their method did not work very well for wetlands and the OA was not very high (85%) across classes. However, the RF model they use was interpretable and they showed which other subclasses were easy or more difficult to predict for. Unfortunately, the entire preprocessing (method) cannot be run directly on GEE: because the shape features cannot be calculated on the platform and were crucial to the overall analysis, the authors first had to do this in a local environment and then upload them.
The authors in [144] proposed a new method for quickly mapping yearly minimal and maximal surface water extents. Using the GEE and Landsat images, temporal changes in the extent of surface water in the Middle Yangtze River Basin were identified. Firstly, based on the estimated value of cloud cover for each pixel, the high cloud covered pixels were removed to eliminate the cloud interference and improve the calculation efficiency. Secondly, the annual greenest and wettest images were mosaiced based on vegetation index and surface water index. Thirdly, the minimum and maximum surface water extents were obtained by the RF classification. Finally, manual noise removal as implemented in ESRI ArcMap was applied to reduce noise in the classification result. In [148], the authors integrated global surface water (GSW) dataset and SRTM-DEM to determine the spatiotemporal patterns of water storage changes in China’s lakes and reservoirs. The dynamic water storage change of 760 lakes and reservoirs, each with an area greater than 10 km2, were evaluated over a time span of 30 years (1984–2015), the total area accounting for about 80% of the total water surface area in China. The HydroLAKES data and China’s lake dataset and river shapefile were also used to help select lakes and reservoirs. Water level data for a total of 30 lakes across China from Hydroweb dataset were used for validation. The DEM-based geo-statistic approach was used to construct hypsometric relationships between water area and elevation for each lake and reservoir. Their data preprocessing was implemented using ArcGIS, GEE was used for extraction and correction of water coverage and also extraction of surface area-elevation pairs, and R software was used for statistical analysis on pixel contamination ratios, hypsometric analysis, and identification of spatio-temporal patterns.
The authors in [154] reviewed recent fluvial geomorphology GEE applications and synthesized three common themes relevant to future planimetric river channel change studies: (1) GEE has been used as a tool for mining the satellite imagery data archive, cloud-masking images and then generating multitemporal image composites; (2) many applications have provided accessible source code and/or data repositories, promoting transparent and open science; (3) cartographic, graphical, and statistical analyses are almost always completed outside of the GEE environment. This study [154] shared a demonstration workflow showing how GEE can be used to extract active river channel masks from a section of the Cagayan River (Luzon, Philippines). The spatiotemporal planform change was then quantified outside of the GEE environment, i.e., extracting centerline position and channel width and calculating centerline migration rates. For RS applications in fluvial geomorphology, challenges remain around issues of scaling, transferability, and data uncertainties; particularly for small- to mid-sized rivers where medium-resolution, multispectral satellite imagery is rarely suitable for geomorphic analyses. Caution is always required to interpret geomorphic changes based on two-dimensional planforms alone, as rivers also adjust in the vertical dimension. By enabling fluvial geomorphologists to take their algorithms to petabytes worth of data, GEE is transformative in enabling deterministic science at scales defined by the user and determined by the phenomena of interest. GEE offers a mechanism for promoting a cultural shift toward open science, through the democratization of access and sharing of reproducible code.
The authors in [146] stated that this was the first study using GEE for RS of water quality parameters in inland waters. Using Landsat imagery in conjunction with ground-based measurements of CDOM absorption and DOC concentrations, a regression-based model was built to estimate CDOM in the six largest Arctic rivers using 424 separate observations from 2000 to 2013.
To estimate water quality parameters like Chl-a concentrations, turbidity, and dissolved organic matter, [152] used ML and DL models to analyze RS imagery. The authors showed that several ML and DL models were able to achieve very low error rates for this regression task. Some of the relationships detected by the models could be used to predict for non-optical variables, as well. However, the authors had to move the ML portion of their analysis off the GEE platform due to “algorithmic limitations” (inflexible models). While a DL model performed well for predicting various water quality indicators, [152] cited a lack of model transparency. They cautioned that feature extraction and expert knowledge may still be necessary to make some sense of the DL model outputs, otherwise they were difficult to interpret, negating the level of accuracy achieved with the model. The authors in [153] developed a methodological framework for mapping Chl-a concentrations with multi-sensor satellite observations and in-situ water quality samples. A SVM model was trained on the GEE cloud-computer platform and used to predict Chl-a concentrations of 12 inland lakes in the tri-state region of the U.S., including Kentucky, Indiana, and Ohio. The results demonstrated that GEE and multi-sensor satellite observations can enable fast and accurate mapping of Chl-a at a regional scale.

Appendix C.6. Textual Summaries for Wetland Mapping

Cloud computing on GEE was utilized in [35] to create an open-source, reproducible map of wetland occurrence probability using LiDAR and RS data for the entire Alberta. Using a BRT, the authors were able to match a current governmental effort in Alberta while also producing a relative variable importance showing which RS variable might be the most useful for future wetland mapping efforts in the area. However, the authors noted that some uncertainties in the underlying training dataset, a lack of subsurface soil information, and having to move between GEE and offline analysis may have contributed to errors in this analysis. The authors in [162] showed that by combining SAR, optical, and LiDAR data on the GEE platform, a BRT model was able to predict peatland occurrence across Alberta province with relatively high accuracy at high resolution. Using different input variable selection methods and optimization techniques, the authors were able to trim down their dataset to six variables, saving time and compute in the final analysis while pointing future studies towards which data would be useful to collect more of for peatland mapping. [162] pointed out that additional training data from field work or photo interpretation will aid in future peatland monitoring and detection studies and that more research needs to go into distinguishing between different wetland classes.
The authors in [161] used optical and SAR RS imagery to produce a 10 m resolution wetland map for the entire province of Newfoundland, Canada, using both a RF model and SNIC. Optical data contributed more to the accuracy of the models, although including SAR boosted accuracy rates. While OA rates were high for distinguishing between wetland and non-wetland classes, distinguishing between wetland sub-types (bog, fen, marsh, etc.) remained difficult. Limitations for the study include not having access to a harmonized Landsat-Sentinel data produced on GEE, not being able to use TensorFlow or DL models on GEE, and a continued lack of ground-truth data for wetland detection studies. In [170], the authors classified wetlands in Newfoundland during three different periods to show the spatial dynamics of these ecosystems. The authors obtained high accuracy rates using both a RF and CART model and were even able to distinguish between wetland subtypes like bogs, fens, and peatlands. The authors used Landsat imagery because its data catalog goes back to the 1980s. This was necessary because of the length of the wetland change detection they were interested in. Still, the authors noted that future mapping applications should focus their analyses on using higher-resolution products like Sentinel imagery to increase accuracy rates even further over wide areas. The authors in [17] proposed using field data collected from one Canadian province to create wetland inventory maps for several others using a mix of optical, SAR, and digital elevation data. However, the authors received mixed accuracy results from their RF model, most likely because the study rests on the assumption that there was a static underlying distribution of data between wetlands across Canadian provinces. The authors noted that their results could be improved if the GEE platform allowed for more samples to be analyzed at once, and if there were more flexibility or choice in choosing ML model hyperparameters or if there were more segmentation algorithms included on the platform.
Across Canada, wetland mapping is a well-studied phenomenon. However, different local and regional agency wetland inventories use different techniques for monitoring wetlands or have altogether different definitions of what constitutes a wetland. Thus, even though several large-scale wetland maps have been produced, they are often not directly comparable. Additionally, these maps are often static and do not continually monitor wetlands through time. However, as [165] detailed, these are not the only barriers to mapping wetlands using RS imagery. Others include obtaining sufficient and recent field data to verify wetland monitoring products, but also the difficulty of monitoring such dynamic landscapes. Wetlands do not have clear-cut boundaries, are extremely diverse landscapes and ecosystems, and are often in flux throughout seasons and years due to flooding and drying. The authors use optical and SAR Sentinel data in addition to field samples over the entirety of Canada and show that almost one-fifth of Canada is covered in wetlands. The study in [165] produced a high-resolution (10-m) wetland inventory map of Canada (an approximate area of one billion hectares), using multi-year, multi-source (Sentinel-1 and Sentinel-2) RS data on the GEE platform. The whole country was mapped using a large volume of reference samples using an object-based RF classification scheme with an OA approaching 80%. They [165] used both pixel- and object-based classification with an RF model and SNIC to reduce noise in the output map. However, the authors came into the study with an accuracy threshold in mind and changed the training and dataset to meet it after already seeing accuracy results. The authors presented uneven performance across Canadian provinces, mainly due to a lack of RS or field data in some locations. The authors in [160] analyzed a large number of field samples alongside Landsat imagery with a RF model to produce a wetland map for all of Canada. While this analysis showed how GEE made it easier to scale up the spatial scope of a given analysis (i.e., move from local to regional, country-level, or global scope) [160], obtained low accuracy scores across Canada. The authors note that more field samples and the use of SAR data could improve future results given that large parts of Canada is often covered by clouds and snow throughout the year. The authors in [168] proposed an object-based classification method to classify Sentinel-1 and Sentinel-2 data on the GEE platform, which resulted in the 10-m Canadian Wetland Inventory. The method consisted of a simple non-iterative clustering algorithm and the RF algorithm, which was applied to identify wetlands in each of the 15 ecozones in Canada. The overall accuracies for each ecozone ranged from 76% to 91%. It represents a 7% improvement compared to the first generation of the Canadian Wetland Inventory.
The authors in [163] used NAIP imagery and LiDAR derived DEM data to detect wetlands across the northern United States using unsupervised classification on the GEE platform. They then compared their output with Joint Research Centre (JRC) Monthly Water History and National Wetland Inventory (NWI) data. Additionally, all code and implementation details were made open source, making it easy for others to verify or build on their results. A benefit of their technique is that unsupervised learning does not rely on underlying ground-truth data, often a bottleneck in ML and wetland mapping studies. However, this was also a limitation in the study as it was difficult to verify their resulting maps other than by comparison with other water and wetland map products (which themselves could have inaccuracies). To get around the limitation that wetlands can be both wet and dry over the course of the same season, the authors in [171] combined Sentinel-1 and -2 imagery with aerial photographs and field data to map the spatial variation of wetlands in portions of the United States over time. First, the authors trained RF and SVM models to predict the occurrence of wetlands and then masked out permanent water using the JRC Global Surface Water dataset. This allowed the authors to show not only permanently inundated wetlands, but how wetlands change over time. The RF model was the most accurate when compared to the SVM and NDWI, while also reducing false positives and negatives. The authors made their workflow open source in the hopes that conservation managers or people without coding experience can rerun their analysis for updated wetland extent information. More analyses should take into account spatial variation while producing environmental mapping applications, especially as governments and nonprofits make conservation decisions based on them. The authors in [159] explored the possibility of using GEE to map coastal wetlands in Indonesia by comparing all of the different classifiers on the platform and how they perform with Landsat, digital elevation, and Haralick texture data. While the results showed that the CART algorithm performed the best on this task across every year of training data, it was unclear from the results whether feature engineering and PCA bands helped the model learn better than from just the spectral input data. While GEE allowed [159] to train several models, some models failed to run due to computational constraints or inflexibility. The authors showed that in all cases, ML models did much better at binary than multi-class classification.
With Landsat 8 and high-resolution Google Earth imagery, [164] used a RF model on GEE to classify tidal flat types and their distribution in China. The authors reported very high classification rates across tidal flat classes and showed that their methods produced on GEE compared favorably to or did a much better job at classifying tidal flats based on visual interpretation. However, the authors detailed that satellites like Landsat did not fully capture tidal ranges, meaning that accuracy could be improved further with future data products that observe full tidal duration distributions. In [169], the authors developed a pixel and frequency-based approach to generate annual maps of tidal flats at 30-m spatial resolution in China’s coastal zone using the Landsat TM/ETM+/OLI images and the GEE cloud computing platform. The resulting map of coastal tidal flats in 2016 was evaluated using very high-resolution images available in Google Earth. The annual frequency maps of open surface water bodies and vegetation were first produced using Landsat-based time series vegetation indices and water-related spectral index. Pixels with a water body frequency spanning from 0.05 to 0.95 were classified as intertidal zones. A threshold value of 0.05 was used to classify coastal vegetation area (vegetation frequency ≥ 0.05) and non-vegetated tidal flats (vegetation frequency < 0.05). Mixed pixels, such as remnant tidal flats water, could not be detected. In [172], the authors first processed high-resolution RS, and UAS imagery to map minimum and maximum water and vegetation extent. They then used Otsu’s thresholding algorithm to automatically detect the best ratio for each index. These two indices were then combined in a composite that showed the total intertidal area in the RS imagery, to which the authors again applied the Otsu thresholding algorithm. The end result was a highly accurate map of tidal flats that did not require any post-processing. The authors compared their results with other tidal flat datasets in China and noted that their method produces (at least visually) better estimated because their method incorporated high-resolution imagery, did a better job at cloud-masking, and achieved better estimates of tidal minima and maxima. Still, the authors noted that more imagery of high and low tides in RS imagery needed to be collected and would increase the accuracy of their method.
A RF model was used on GEE in [166] to identify water cavities where sebkhas form in Morocco. The authors used digital elevation data, SAR, and optical imagery, as well as digital photos on GEE to identify saltwater cavities and their aquifers with high accuracy. However, future challenges remain in incorporating multi-sensor, multi-temporal, multi-resolution RS big data and in improving open-source, cloud-based ML workflows for EO data. The authors in [167] compared the performance of a XGBoost model to a CNN for wetland type classification. The authors got a decent accuracy score, but their F1-score was bad, so it was not clear what the models were actually learning. The authors were also not able to train the two models on the same two subsets of data, making their performance not directly comparable. However, in addition to making their resulting maps and trained CNN model open source, the authors run an informative comparison between how long it takes to run and train the two models used in this study. The CNN and XGBoost model took the same time to train, but the CNN took far less time to predict on the test set. More studies should adopt this reporting metric so that researchers can more clearly evaluate the tradeoffs between using specific models for their use-cases.

Appendix C.7. Textual Summaries for Infrastructure and Building Detection, Urbanization Monitoring

The authors in [178] created a large, vectorized, ground-truth verified dataset in India specifically for the purpose of being able to train different ML models. They verified the utility of the dataset by training CART, RF, and SVM models on GEE and compared their predictions to those of the WorldPop dataset. While manually creating a large dataset takes time, the authors showed that they can achieve accuracy rates of 87% with the RF model. The authors also compared different combinations of input data and their impact on model performance. For their application, Landsat 8 data served as better input than Landsat 7 alone or Landsat 7 data with computed indices like NDVI.
To investigate how best to identify impervious materials in RS imagery regardless of cloud cover, [182] combined nighttime light, DEM, and SAR data and a RF model on GEE. Their resulting maps were more accurate than commonly used maps like GlobeLand30. More importantly, though, the authors quantitatively showed that using multiple sources of data were better than single sources for this task: optical data were the most important, but SAR data improved accuracy rates across all metrics. The mounting expansion of impervious surfaces (major components of human settlements) could lead to a series of human-dominated environmental and ecological issues. In [180], the authors put forward a new scheme to conduct long-term monitoring of impervious−relevant land disturbances using Landsat archives. The developed region was identified using a RF classifier. The GEE-version LandTrendr was then used to detect land disturbances, characterizing the conversion from vegetation to impervious surfaces. Finally, the actual disturbance areas within the developed regions were derived and quantitatively evaluated.
The authors in [179] accessed the impact of urban form on the landscape structure of urban green spaces in 262 cities in China. They preprocessed and classified 6673 Landsat scenes for these cities using the RF classifier on GEE. Subsequently, they calculated several landscape structure metrics and urban form metrics. To evaluate the relationship between landscape metrics and urban form metrics, a BRT model was constructed to analyze their relationships. The results revealed that cities with a high road density tended to have a smaller area of urban green spaces and be more fragmented. In contrast, cities with complex terrains tended to have more fragmented urban green spaces.
A semi-automatic large-scale and long time series (LSLTS) urban land mapping framework was demonstrated in [183] by integrating the crowdsourced OpenStreetMap (OSM) data with free Landsat images to generate annual urban land maps in the middle Yangtze River basin (MYRB) from 1987 to 2017. First, the annual Landsat images and the related spectral indices were collected and calculated in GEE. The OSM related data were collected and processed manually in ArcGIS to generate the training samples. Then, the generated samples were uploaded to GEE. Two classification algorithms were used: CART and RF. Pixels that were both classified as urban land by the two methods were labeled as urban land. The classified maps were downloaded from GEE and a spatial-temporal consistency checking was further performed. Except for the generation of reference data for training and validation as well as post classification analysis, most of the data processing was performed automatically in GEE. Use of crowdsourced geographic data (CGD) such as OSM came with many challenges: OSM polygons may overlap and contain multiple LULC types; there is a large diversity of tags in OSM, some of which cannot be converted directly to LULC classes; most of human activities are in urban areas, resulting in an imbalance of (non-urban) class data. The authors noted a lack of GEE infrastructure, such as 1) GEE API related to CGD, that could facilitate the training samples generation, and 2) direct import of the Google Earth annual very high resolution (VHR) images to GEE that users can set as the background image and collect validation samples on the cloud. In this study, urban areas on RS images were defined as sites that were dominated by a built environment, including all non-vegetative, human-constructed elements and were defined as features with tags of all non-vegetative, human-constructed elements including road networks and buildings in OSM data.
To explore the possibility of identifying greenhouses in RS imagery over a large area in China, [185] designed an ensemble ML model to distinguish them from water, forest, farmland, and construction sites. The authors found that of various ML models available on GEE, the CART, gmoMaxEnt, and RF models performed the best. These models were then combined through a weighting system to make predictions, and this resultant ensemble model performed better at this classification task than any of the individual models. Additionally, [185] looked at which features play the most important role in the ML model’s predictions. The authors found that spectral information was most useful, but that texture and terrain features helped boost the accuracy even more. However, this method relies on optical imagery, so it depends on relatively cloud-free imagery. More work would need to be done to help the model generalize to situations where cloud-free imagery is not available and to distinguish between greenhouse subtypes.
The authors in [186] designed a workflow for mapping urban sprawl over time in Brazil using a RF on the GEE platform. They used optical RS imagery from the Landsat and Sentinel platforms, alongside DEM data and found that the cities used for their case study had built out horizontally instead of densifying vertically. Still, the drivers behind the urban sprawl need to be investigated further, in addition to how best to incorporate their maps into the governmental policy decision-making process.
Using different vegetative indices (EVI, Gross Primary Production, etc.) derived from Landsat and MODIS data, [181] showed that urban sprawl in Shanghai had increased significantly in the last decade and a half. The spread of suburbs in Shanghai had led to much less green space over a 15-year period. This is a very impactful area of research that can be done completely on the GEE platform and replicated across cities around the world. Produced together with heatmaps of a given city, urban vegetation maps can be used to pursue environmental justice strategies that can improve equitable access to green spaces and attempt to reduce extreme temperature disparities (“heat islands”) in cities.
Producing up-to-date land cover maps can be time-consuming and expensive to make. This is especially true in areas without dense data coverage for common LULC classes. In [184], the authors combined Landsat 5 and 8 RS imagery, slope from a digital terrain model (DTM), and GLCM information, and then trained a SVM to output two classification maps for portions of Rwanda: one for 1987 and the other for 2019. The authors then used the LandTrendr algorithm to compute LULC changes through time, which allowed them to produce maps without having dense field observations for validation. They showed that while water, wetland, and forested areas had remained fairly constant in terms of total area, urban development has been replacing open land and agricultural areas.

Appendix C.8. Textual Summaries for Wildfires and Burned Area

The authors in [190] proposed a method for identifying fire-induced disturbances using the LandTrendr and FormaTrend algorithms on the GEE cloud-computing platform. Various metrics were used to quantify fire disturbances, such as type, magnitude, direction, and duration. The results showed that the FormaTrend algorithm outperformed the LandTrendr algorithm in identifying low-severity fire-induced disturbances. Nevertheless, the LandTrendr algorithm can be useful for generating change metrics that are useful for studying post-disturbances.
To determine the impact of using higher-resolution RS data products, [192] compared how Landsat and Sentinel optical imagery affected a ML model’s performance in burn area classification. The authors used Weka clustering output and different spectral and index information as input into the CART, RF, and SVM models available on GEE. They found that both Landsat and Sentinel imagery produced much better maps that captured small burn areas that current maps and fire monitoring products like MODIS were not able to capture, though Sentinel imagery led to an underestimation in burn area. Additionally, the authors found that the tree-based algorithms performed comparably to each other but much better than the SVM model. This study highlighted the importance of analyzing different data sources and ML models to show their respective contribution to predictive performance.
The authors in [193] developed an automated and cloud-based workflow for generating a training dataset of fire events at a continental level using freely available RS data. The training dataset was applied to different machine learning algorithms (i.e., RF, NB, and CART). It was found that the RF outperformed the other algorithms, which was hence used further to explore the driving factors using variable importance analysis. The results showed that the most important variables were soil moisture, temperature, and drought.
In [195], the authors used Sentinel-2 data along with two different burn areas and LULC maps to train different ML classifiers (k-nearest neighbor (KNN), RF, SVM) to map wildfire damage in Australia. They first used an optimization algorithm to select features and showed that this improves model performance for each model used. The RF model with feature selection performed the best and the authors were able to predict for burned areas in different LULC types, whereas previous studies had focused on producing binary burned/non-burned maps. However, [195] noted that low resolution LULC maps were a limiting factor in their analysis, and that future studies could be repeated with higher resolution ones to improve model performance even further.
The authors in [196] designed a completely cloud-based DL workflow combining Google Cloud and GEE to classify burn scar areas in Brazil. Using a DNN, the authors produced a fire burn map that was more accurate than maps produced by MODIS and the National Institute for Space Research in Brazil. However, perhaps more importantly, the authors identified the areas that their map disagreed with other maps and why. They found that the southern areas of the Cerrado were misclassified more often in all three maps, and that clouds, shadows, and plant regrowth were the main features leading to misclassification. This type of analysis is important because it can highlight where current maps fall short while making them interoperable with higher-resolution, more accurate maps being produced today. Still, [196] said that the number of ground-truth observations was the limiting factor in their analysis and that model’s performance could be improved further with more validation data.
The 250 m spatial resolution of products like FireCCI51 leave out a lot of detail, so the authors in [191] used CBERS, Gaofen, and Landsat imagery to create a 30 m burned-area dataset for 2015. The authors first trained a RF on this imagery and set it to output probabilities instead of class predictions. These probabilities were then used as a starting point for a pixel-aggregation algorithm that classifies neighboring pixels as whether they belong to the burned-area class or not. The authors called this “burned-are shaping” and the resulting maps for this process were used as training data for an SVM. The resulting map had good spatial agreement with FireCCI51 but had much higher spatial resolution with more detailed and accurate boundaries. However, the authors noted that their method had difficulty recognizing burned areas from recently plowed fields in agricultural areas, so crop-type masks should be used to remove potential false positives. Additionally, Landsat data were used for both the data collection and validation stage. Thus, the authors were not able to assess the suitability of using Landsat imagery for data collection purposes despite their high accuracy rates. Later on, [194] adapted the exact same processing steps on GEE to produce a burned area map for the year 2005, illustrating how sharing and storing code on GEE makes it easy to re-run analyses or adapt them for new use cases.
In order to better interpret the fire severity in terms of on-the-ground fire effects compared to non-standardized spectral indices, [189] produced a map of composite burn index (CBI), a frequently used, field-based measure of fire severity. A RF model was built on the GEE, describing CBI across forested landscapes in North America as a function of multiple spectral indices and climatic and geographic coordinates. The robust relationships and the fairly high model skill in most regions suggest the resulting CBI maps may be beneficial in remote regions where it is expensive and difficult to acquire field measures of severity (e.g., Alaska and the majority of Canada).

Appendix C.9. Textual Summaries for Heavy Industry/Pollution Monitoring

The authors in [197] used time series data of the Soil Adjusted Total Vegetation Index (SATVI), calculated from Landsat 5 imagery, to track changes and assess vegetation regrowth on 365 abandoned well pads located across the Colorado Plateau. BFAST (Breaks for Additive Season and Trend) time-series models were used to fit temporal trends, identifying when vegetation was cleared from the site and the magnitudes and rates of vegetation change after abandonment. The time series metrics were used to calculate the Relative Fractional Vegetation Cover (RFVC) of each pad, a measure of post-abandonment vegetation cover relative to pre-drilling condition. Cover change values were standardized by measuring with respect to vegetation cover values at nearby reference pixels, undisturbed by energy development, determined using an automated reference site selection algorithm. Statistical modeling using linear regression and a RF was performed to identify the environmental and/or management variables most related to RFVC response. Results suggested that reclamation efforts on abandoned oil and gas pads of the Colorado Plateau gave mixed results. A substantial amount of year-to-year variability in relative fractional vegetation cover corresponded to moisture conditions assessed using an index of evaporation and drought (SPEI). Both time series analysis and statistical modeling used the R package.
The authors in [198] presented a mapping study for mining areas in the Brazilian Amazon using Sentinel-2A images and the CART classifier in GEE. The map was then exported to ArcGIS, in which the data provided by Brazilian National Department for Mineral Production—DNPM (license status, mineral type among other information) was integrated with the mining map. The mapping results were compared to the high-resolution RapidEye imagery. The area occupied by each mining category was computed, providing key information for the environmental management of mining activities.
In [202], the authors made use of Landsat imagery and the LandTrendr algorithm to monitor water accumulation in subsidence areas of past mining in China. First, they identified permanent versus seasonal water bodies, then used a water index in areas of known mining to track water changes. The authors incorporated a popular subsidence simulator that predicted for water accumulation at underground mining sites and showed that their dataset had good agreement with it. Thus, their processing workflow can be integrated with the simulator to verify the output. While the authors achieved high accuracy rates, this varied dramatically between different years and between different stages of water accumulation. The authors noted that more work needed to be done to increase the robustness of their processing pipeline to more accurately distinguish between water accumulation at mining sites and flooding and heavy rainfall events.
To monitor mining disturbances at a coalfield in Mongolia, [199] used the LandTrendr algorithm to analyze Landsat data. The authors designed a fast, efficient method on the GEE platform to monitor surface mining operations and show that only 26% of promised reclamation was undertaken at the Shengli Coalfield. However, the authors noted that their pixel-based classification approach would benefit from a comparison of an object-based approach (although many object-based classifiers are not on GEE).
In order to keep track on mines and dams in Brazil, [200] used two different CNNs to first classify potential mining sites and then to classify its perceived/potential environmental risk. In this two-phase approach, the authors were able to identify 263 unregistered mines and designed the CNN to work on variable-sized RS images. This analysis relied on government data, which may not be available in other locations where mining was taking place. Additionally, since the authors used a DL approach, they had to move their training process from GEE to Google Colab. Even so, their data were too big for the GPU memory limits.
With GEE JavaScript API, [201] used RF classifiers to produce maps of mine waste extents with Landsat-8 and Sentinel-1 and Sentinel-2 archives. The simplest method of mapping mines is through thresholding, where a division between spectral response that represents mines and non-mine areas can be clearly defined. Thresholding only produces high accuracy when the spectral response of mines is significantly different than the surrounding non-mine areas. Although the interpreter attempted to collect training data points that were representative of all of the mine types as well as the variability in the other classes, more training data may be required to better distinguish classes as similar as outcrops/rock, mines, and urban areas. The RF classification algorithm computes Mean Decrease in Accuracy (MDA) which is commonly used to assess variable importance. No functions exist within GEE (yet) to analyze the importance of variables in a RF classifier therefore this was completed using extracted training data values in R.
To test the efficacy of different ML algorithms for identifying waste and dump sites in optical imagery, [203] optimized the parameters for the CART, RF, and SVM algorithms available on GEE. The authors found that the RF algorithm was by far the most accurate even when using several optimization schemes for each model. However, the authors noted that a lack of elevation data in their processing pipeline led to classification errors, and that more work could be done using DL methods to identify waste and dump piles in the future.

Appendix C.10. Textual Summaries for Climate and Meteorology

In [204], MODIS satellite observations from 2000 to 2015 were analyzed using GEE to derive global snow-free land surface albedo estimations and trends at a 500 m resolution. The bulk of albedo trends can be attributed to rainfall, changes in agricultural practices and snow cover duration. This study confirmed that at local scale, albedo changes were consistent with land cover/use changes that were driven by anthropogenic activities such as deforestation, irrigation, and urbanization.
The authors in [210] proposed a downscaling framework (from 25 km to 1 km) for TRMM precipitation products by integrating GEE and Google Colab. Three ML methods, including a Gradient Boosting Regressor (GBR), a Support Vector Regressor (SVR), and an ANN were used to establish the relationship between precipitation and four environmental variables, including elevation, longitude, latitude, and one of the three vegetation indices (NDVI, EVI, LAI), The StandardScaler algorithm of scikit-learn was used to standardize variables using their means and standard deviation to eliminate the effects of different scaling. The GridSearchCV algorithm with 10-fold cross-validation (GSCV) splitting strategy was used to identify the best hyper-parameter values of each machine learning-vegetation index. The monthly precipitation maps were derived from the annual downscaled precipitation by disaggregation. According to validation in the Great Mekong upstream region, the ANN method yielded the best performance when simulating the annual TRMM precipitation. The most sensitive vegetation index for downscaling TRMM was LAI, followed by EVI.
The authors in [205] performed major-axis regression on these datasets in pairs (7 ETM+/8 OLI, 7 ETM+/2 MSI, and 8 OLI/2 MSI) across the entire coterminous United States and were able to determine cross-platform correction coefficients for the Blue, Green, Red, NIR, and SWIR bands present in all three satellites. The authors then validated their methodology and correction coefficients by analyzing these same satellite platforms across Europe. While [205] did not create an actual integrated dataset for use on the GEE platform, their research was the first step to building such a dataset and making sure that it is of high quality.
The authors in [206] implemented a cloud-based workflow and compared that to the traditional method of using SAGA GIS for producing local climate zone city maps based on data like WUDAPT. The authors showed that the traditional method was more accurate on average than the GEE method when using only the datasets available to WUDAPT and when trying to transfer an urban morphology classifier between individual cities. However, using GEE allowed the authors to aggregate information from multiple cities in the same climate zone and for the RF model they used to be trained on more RS data and derived indices that were not available in the WUDAPT dataset. These improvements boost OA scores in urban topology classification. Thus, while the GEE and more traditional classification methods are not directly comparable, the cloud-based method outlined by [206] can be used to complement research being done in urban topology studies.
The authors in [207] investigated the impacts of landscape changes on LST intensity (LSTI) in a tropical mountain city in Sri Lanka. Annual median temperatures from three years were extracted from Landsat data through the GEE interface. The SVM algorithm was used to conduct LULC mapping, which was then used to calculate the fractions of built-up, forested, and agricultural land based on urban–rural zone analysis. The study showed that rapid development was spreading towards rural zones, and the fraction of built-up land influenced the increase in annual mean LST. It was recommended that having a mixture of land-use types would considerably control the increasing LST in the study area.
The authors in [208] presented a method to obtain high-resolution sea surface salinity (SSS) and temperature (SST) by using raw satellite data, i.e., Sentinel-2 Level 1-C Top of Atmosphere reflectance data. A deep NN had been built to link band information with in situ data, which was obtained from the Copernicus Marine In Situ platform. The deep NN providing the best results was found to be composed of 20 hidden layers with 43 nodes in each layer. Shortcuts were used in the network architecture to avoid the so-called vanishing gradient problem, providing an improved performance compared with the equivalent feed-forward architecture. Accurate salinity values were estimated without using temperature as input in the network. However, a clear dependency on temperature ranges was observed, with less accurate estimations for locations where ocean temperature falls below 10 °C. The NN presented in this paper outperformed classical architectures tested for regression problems.
To study this mechanism further, the authors in [209] used a LSTM and compared the performance to a RF for carbon fluxes in global forests. They combined bioclimatic and forest age data with Landsat imagery and MODIS atmospheric reflectance maps as input data to their models. The authors showed that previous seasons’ water and temperature records (specifically from the spring) affected the ways forests release carbon in the current season. Still, the LSTM model used in [209] struggled when it was trained in one site or one forest type and applied to another. For instance, their ML and DL models did not perform well in the Tropics and had varying performance predicting carbon flux for evergreen and deciduous forests. This lack of generalizability was indicative of the way that carbon fluxes vary from forest to forest around the world, but also of that their dataset was biased towards older, undisturbed forests which led the LSTM to underperform for those classes.

Appendix C.11. Textual Summaries for Disaster Management

The authors in [211] proposed a new method for mapping landslides in Nepal using RF. Landsat images acquired between 2012 and 2016 were processed using GEE, which were then used to compute spectral indices and derive texture information. In addition, DEM data were used to characterize landscape patterns. An RF model was constructed based on spectral indices, texture information, and landscape patterns. The RF model was applied to Central Nepal to identify landslides with reasonable accuracy. There are several limitations to this study. First, GEE was used as a preprocessing platform. Some analyses were conducted outside GEE. Second, the study area was rather limited with only one Landsat scene. Last but not least, the accuracy varied substantially depending on the distribution and availability of training samples.
The authors in [212] analyzed vegetation, thermal, moisture, and climate datasets, along with surface drainage records, using a RF model on the GEE platform to create surface drainage maps. In addition, the authors used optical and SAR imagery and completed a relative variable importance analysis with the RF model. They [212] found that surface drainage maps were sensitive to RS data scale while identifying soil properties and land surface temperature as important features in their predictions. However, their method was not able to predict for all land class types equally well, and the authors noted that their processing method may not work in other areas due to the lack of data like government surface drainage permit records (could mention example between North Dakota and Minnesota below).
The authors in [213] took advantage of the easy-to-find data and freely available compute on GEE in order to produce flood maps in Bangladesh. First, they used Landsat data from pre-flood imagery and trained a CART model to make a land-use map for the country. Then, the authors analyzed Sentinel imagery with a geographic object-based image analysis (GEOBIA) model to produce water vs. non-water classification maps. These maps were then combined to show which land-use types in different parts of Bangladesh are flooded and for how long. While the authors achieved high accuracy rates, their method struggled to differentiate flooded crop fields from inundated areas. This could be solved by overlaying a crop-use map to remove mislabeled areas. However, it is important to note that using GEE for real-time hazard response is not yet currently advisable given that there is a lag between the time RS imagery is collected and then subsequently uploaded to the platform.
The authors in [214] presented a case study for the 2018 Kerala flood in India. They demonstrated how GEE can be used to process large optical and SAR RS datasets, in conjunction with field and precipitation data, using image processing techniques to produce high-resolution flood maps over a large area. This application/processing flow was called GEE4Flood and processes large datasets quickly to produce flood maps. However, the authors noted that several challenges remain in making their algorithm operational: the input data need to have a reference image of the area pre-flood that is cloud free, and Otsu’s thresholding algorithm needs to have classes (flood versus non-flooded) that are relatively equal in frequency. This may very well not be the case for imagery that is inundated. Additionally, it is difficult to get in-situ data from flooded regions as the flood is happening, making it difficult to validate their results. Lastly, the GEE platform has a significant delay in uploading the most recent RS imagery, up to several days later, making this unsuitable for real-time flood forecasting.
To assess the suitability of GEE for disaster recovery, the authors in [215] used a RF model trained on Landsat imagery to do change detection on pre- and post-disaster areas in the Philippines. However, the authors found that a lack of cloud-free VHR imagery led to lesser model performance, especially in complex urban environments in the aftermath of a hurricane. In the future, SAR imagery and DL methods could be used to increase model accuracy.
Using RGB images as input, the authors in [216] proposed an automatic building detection method to find buildings and their irregularities in pre- and post-disaster (sub-) meter resolution images. Firstly, a knowledge-based method, which utilized shadow information, was combined with an edge-based method that uses texture information, to find building maps in temporal pre-disaster images. Then, a two-level fusion that used spectral and georeferenced features was applied to find building irregularities in post-disaster images. Building facade and rooftop were also considered in the oblique imagery. This method was implemented on the GEE platform and evaluated using Hurricane Nate (2017) and Hurricane Harvey (2017) oblique images. Temporal pre-disaster data were provided by NAIP, which acquired aerial imagery in 1-m resolution during the agricultural growing seasons. NOAA provided post-disaster data in nadir and different oblique angles varying about 30 degrees. Some post-disaster images were manually uploaded to GEE servers for evaluation.

Appendix C.12. Textual Summaries for Soil

To determine how different datasets and ML models perform in predicting soil organic matter, the authors in [222] compared an ANN, RF, and SVR model with MODIS, Sentinel-2A, and DEM data as input. They found that for all models, Sentinel-2A data were better for model performance due to its higher spectral and spatial resolution. Among the models, the RF performed the best, making the best combination the RF trained on Sentinel-2A data. The authors also looked at which input bands were correlated with better predictive performance and found that indices (e.g., NDVI, NDWI) were not correlated in either dataset, while the elevation, SWIR, RGB bands were. This type of analysis is important because it addresses not only which dataset is important, but which data are important to include based on data availability. However, the authors caution that more work needs to be done to make their model more generalizable and robust. This could take the form of incorporating different types of data or data from outside their study region so that the model has more data and more variation to learn from.
The authors in [217] produced an early soil mapping study on the GEE platform in 2015. They used a Rifle Serial Classifier to test out soil-type classification and a CART for soil organic matter percentage regression over the entire contiguous United States. Their methods at the time, while poor, matched other comparable studies in digital soil mapping, meaning that GEE was not a limitation on performance. The cloud computing platform sped up their processing time from 1.5–3.5 h down to 2 min. However, despite the freely available compute, the authors noted several limitations with the platform. First, a major limitation of the platform (then still in its early stages) was a lack of processing methods like kriging and uncertainty analysis. Second, the authors came up against processing limits when using a large number of field samples. According to many other authors included in this review, these two issues are still some of the top cited limitations on the platform. Lastly, the authors noted that while a researcher can make data and code scripts private, ultimately the analysis is stored on a remote server so GEE may not be suitable for analyzing, storing, or transmitting sensitive data.
The authors in [218] explored GEE’s potential to make a global soil salinity map based on field data and Landsat thermal infrared imagery. GEE allowed the authors to run their processing steps quickly, though creating thermal mosaics on the platform still took hours. However, because the field samples dataset the authors used was sparse, they achieved accuracy rates between 67 and 70%. Visual analysis of their results showed that some regions with low field samples were correctly classified on the regional scale, while others were considerably overestimated. In their conclusion, the authors noted that many researchers may be hesitant to use the platform since model and processing function implementations may not be known and could change without the researcher knowing.
Using field observations, DEM data, and Landsat imagery, the authors in [219] sought to address these issues by mapping different soil types and soil attributes across a large region in Brazil using the GEE platform. The authors were able to show that elevation, climate data, as well as the SWIR2, NIR, and Blue bands from Landsat imagery are the most important factors in determining soil types, even at different soil depths. However, the authors noted that more soil observations were needed to increase the accuracy of their method and would aid further digital soil mapping studies.
The authors in [221] were able to produce a global, high-resolution soil moisture map on GEE, by using optical, thermal, and SAR imagery in addition to DEM data. The authors used a GBRT model to train on in-situ observations paired with RS imagery to then predict soil moisture in other locations. After running a relative variable importance analysis, the authors found that optical RS imagery and land-cover information played the most important roles in determining soil moisture content, but that SAR imagery and soil data also contributed significantly to the model’s overall performance. Their finding highlights other studies results ([95,161,182]) that the combination of optical and SAR data improves predictive outcomes. The entire processing pipeline is now an open-source Python package (PYSMM). However, the authors had issues with the GEE platform. The model needed to be trained offline due to issues with flexibility and design, and the validation soil moisture observation dataset was not available on the platform. The authors noted that sparse or clustered observations led to model inaccuracies, which was a call to both collect more soil moisture observation data but also to upload more of it (and other diverse types of data) to the GEE platform.
The authors in [220] explored the effects of spatial aggregation of climatic, biotic, topographic and soil variables on national estimates of litter and soil C stocks and characterized the spatial distribution of litter and soil C stocks in the conterminous United States (CONUS). Litter and soil variables were measured on permanent sample plots from the National Forest Inventory (NFI) from 2000 to 2011. These data were used with vegetation phenology data estimated from Landsat 7 imagery and raster data describing environmental variables for the entire CONUS to predict litter and soil carbon stocks. Specifically, the maximum of NDVI values from the growing season and forty categorical and continuous environmental variables compiled from various data sources and resolutions with ArcGIS were selected as predictor variables. Three supervised ML methods (i.e., RF, quantile regression forest (QRF) and KNN) were chosen to model the distribution of litter and soil carbon stocks. All analyses were conducted with R. The results suggested that the RF and QRF prediction models performed better than KNN models although results across the three methods were similar. All modeling approaches performed better for soil compared to litter layers and the spatial pattern of association between litter, soil carbon, and environmental covariates observed from the RF and QRF models may reflect spatial patterns in litter decomposition, soil chemistry, and plant and microbial communities.

Appendix C.13. Textual Summaries for Cloud Detection and Masking

Researchers in [223] treated cloud detection as a change detection problem across time using a kernel ridge regression model. This allowed them to detect nonlinear features that are easier to identify in RS time series imagery. The authors tested their algorithm on Landsat and SPOT imagery and showed that it performed better than Fmask while obtaining less false positives during classification. Additionally, the authors in [223] implemented their model directly on GEE so that it can be run alongside other preprocessing tasks without the need to switch to an outside cloud or offline coding environment.
Cloud detection methods for optical satellite images can be divided into monotemporal single scene and multitemporal approaches. Single scene approaches use only the information from a given image to build the cloud mask, while multitemporal approaches also exploit the information of previously acquired images, collocated over the same area, to improve the cloud detection accuracy. Multitemporal methods are computationally demanding, and most of the multitemporal cloud detection schemes cast the problem as a change detection problem. The authors in [224] implemented a multitemporal cloud detection method using the GEE Python API, which was applied to the Landsat-8 imagery and validated over a large collection of manually labeled cloud masks from the Biome dataset. The approach was based on a simple multitemporal background modeling algorithm, in which k-means clustering was applied to the difference image between the cloudy image (target) and the cloud-free estimated background (reference). The obtained clusters were then labeled as cloudy or cloud-free areas by applying a set of thresholds on the difference intensity and on the reflectance of the representative clusters. This approach was found to outperform single-scene threshold-based cloud detection approaches such as FMask (Zhu et al. 2015). More specifically, linear and nonlinear least squares regression algorithms were proposed to minimize both the prediction and the estimation error simultaneously. Significant differences in the image of interest with respect to the estimated background were identified as clouds. The use of kernel methods allowed the generalization of the algorithm to account for higher-order (nonlinear) feature relations. The method was tested in a dataset with 5-day revisit time series from SPOT-4 at high resolution and with Landsat-8 time series.
A CNN model, called DeepGEE-CD, was built in [225] to detect clouds in RS imagery directly on the GEE platform. First, the authors developed and trained the CNN locally and then uploaded the weights to GEE. They then implemented most of the layers in the network, with the exception of a few of the more complicated convolutional layers which were too complicated to be coded directly on the GEE platform. This CNN can run inference directly in the cloud. In addition, the authors made the model flexible, able to handle RS imagery of varying input sizes. The CNN gets comparable performance to the Fmask algorithm, but without the additional information in the form of physical rules that Fmask needs to work well.
To explore how CV algorithms and ML models can be used together on GEE, the authors in [226] combined the existing Cloud-Score algorithm with a SVM to detect clouds in imagery ranging from Amazon tropical forests, Hainan Island, and Sri Lanka. The cloud-score algorithm first masked input RS imagery, then was used for input to train the SVM. This process led to much higher accuracy rates than any of the other CV algorithms for cloud detection and did so with considerably lower error rates.
The authors in [227] implemented their cloud removal DL model directly in GEE. Their model, DeepGEE-S2CR, is a cloud-optimized version of the DSen2-CR model presented in [228] and fused co-registered Sentinel-1 and Sentinel-1-2 images from the SEN12MS-CR dataset. First, the authors trained their CNN locally and then uploaded the weights to GEE. They then designed the network using the GEE API, implementing layers and custom cost functions so that the CNN fits into memory constraints. The authors showed that their model had a slight reduction in RMSE, but produced very similar results to the bigger and more compute-intensive DSen2-CR. The CNN can be run directly on GEE without the need to download, store, and process data locally.

Appendix C.14. Textual Summaries for Wildlife and Animal Studies

UAS were explored in [229] for identifying Ny. darlingi breeding sites with high-resolution imagery (~0.02 m/pixel) and their multispectral profile in Amazonian Peru. Both RGB and multispectral imagery were collected simultaneously, and the addition of multispectral bands were found to add critical information to differentiate the water bodies. All multispectral orthomosaics were uploaded to GEE assets and a RF classification was performed. Their findings back the use of a low-cost UASs and the GEE platform to achieve a highly accurate classification of the differential spectral signature of water bodies that harbor Ny. darlingi larvae and those that do not, resulting in new ways to control and survey malaria in affected settings. The portability of UASs allows investigators to navigate moderately hostile and complex environments and to generate maps with a higher resolution compared to those available through satellites. However, transport of imagery in any physical storage unit to GEE needs a stable internet connection. Ways need to be developed to speed up image transfer and processing.
A set of freely available environmental variables (i.e., habitat information from RS observations and climatic information from weather stations), was used in [230] to assess and predict the roadkill risk. For each of the seven medium-large mammals, they performed binomial logistic regressions relating the roadkill presence-absence in the road sections across the survey dates, with the collection of environmental variables (land cover classes, forest cover, distance to rivers, temperature, precipitation, and NDVI) and the temporal and spatial trends of overall roadkill. The intrinsic spatial and temporal roadkill risk were the most important variables, followed by land cover, climate and NDVI. The modeling framework of coupling RS information, climate data, traffic volume and biodiversity metrics, may allow to provide more accurate roadkill risk predictions in near real time and potentially at the global scale.
A semi-automated framework developed in [231] for monitoring large complex wildlife aggregations using drone-acquired imagery over four large and complex waterbird colonies. The semi-automated approach applied a RF classifier to high-resolution drone imagery to identify nests, followed by predictive modeling (k-fold estimation) to estimate nest counts from the mapped nest area. Arithmetic and textural metrics from the red, green and blue channels in the drone data were calculated and used as predictor variables in the RF classification, which helped capture more of the spatial and spectral variation in target features. The predictor variable calculation and nest mapping routines using RF classification were implemented in GEE. All statistical analyses, including nest counting and accuracy assessment, were performed in the R programming environment.
Using Landsat RS imagery, climate variables, and government environmental data, the authors in [232] analyzed Pine Processionary Moth outbreaks in pine forests in southern Spain. The authors first used a KNN to determine which features related to various vegetative indices and environmental variables. Then, after choosing a representative subset of their data based on the KNN’s output, [232] used a RF to predict for pest outbreaks based on ground-truth defoliation data. The authors found that minimum temperatures in February and the precipitation patterns for each season were the best at predicting pest outbreaks, followed by vegetative indices. While having access to medium-resolution imagery helped the authors map pest outbreaks in pine forests over a large area in Spain, they noted that more work should be done to collect more ground-truth data and to explore the use of higher-resolution data products like those from the Sentinel satellites.

Appendix C.15. Textual Summaries for Archaeology

The potential role of GEE in the future of archaeological research were demonstrated in [233] through two case studies. WorldView-2 satellite imagery with eight bands of spectral resolution and spatial resolution of 1.84 m provided the base for analysis in both cases. The first case used a RF classifier in GEE to automatically identify specific archaeological features across the landscape of the archaeologically rich Faynan region of Southern Jordan. The second case used the Canny edge-detection algorithm in GEE for automatic vectorization of archaeological sites. The authors noted that the vectorization was not appropriate for detailed mapping at a subsite scale unless the results were modified by a smoothing function. However, at a site-wide or regional scale, the results obtained can successfully identify the main features in the landscape.
Drone imagery and GEE were used in [234] to detect potsherds in the field in the hopes of speeding up this process. They trained a CART, RF, and SVM on this drone imagery, but only the RF model produced adequate results. The authors tested their workflow in two separate locations in Greece. In their processing pipeline, the authors set the RF model to output probabilities for where potsherds are in part of a drone image. Then, the authors iterated over the data three separate times, subjectively determining a threshold point at every iteration to suppress false positives. This is generally bad practice in ML research because it means that humans are actively changing the results of the analysis before releasing them. Perhaps most importantly, the authors vectorized their results at the end of their analysis so that other researchers can use them for visualization or classification tasks. This points to an urgent need in EO and ML research: more studies should attempt to vectorize their data instead of producing binary or multi-class classification maps. However, their analysis depends on having an internet connection to upload, process, and classify data with GEE in the field. This is not always possible, perhaps limiting the future utility of their work. The authors also mention data and compute limits on GEE as being a main limitation to their analysis. For example, every image uploaded to GEE (at the time of this paper’s release) is 10 GB. Because the authors used sub-centimeter drone imagery, they had to downsize each image before uploading it, resulting in a loss of resolution.
Optical and SAR data on GEE were used in [235] to create a classifier capable of outputting a likelihood that there is a mounded site in a given region of the Cholistan Desert in Pakistan. Doing field sites there is difficult because it can be unsafe for people due to its heat and remoteness. Thus, it is important that the authors were able to use a RF model to show where likely mound sites are to analyze further. However, the authors introduced some subjectivity by tweaking the probability threshold for mount/no-mound boundaries. This was necessary because of a lack of high-quality validation data, so it is difficult to measure the accuracy of their process.

Appendix C.16. Textual Summaries for Coastline Monitoring

The authors in presented an automated method was proposed in [236] to extract shorelines from Landsat and Sentinel satellite imagery. The accuracy of this method was assessed for the Sand Motor mega-scale nourishment by comparing the Satellite Derived Shorelines (SDS) to topographic surveys. The NDWI grayscale image was classified into a binary water-land image using the unsupervised greyscale classification method. A region growing algorithm was then applied to cluster all pixels identified as water into a coherent water mask. The SDS coordinates were smoothed using a 1D Gaussian smoothing operation to obtain a gradual shoreline. The results showed that the average accuracy of the SDS for the ideal case of cloud and wave free images for the Sand Motor was 1 m, well within the pixel resolution. The accuracy decreased in the presence of clouds, waves, sensor corrections and georeferencing errors. The most important driver of inaccuracy is cloud cover, which hampers the detection of a SDS and causes large seaward deviations in the order of 200 m, followed by the presence of waves, which cause deviations of about 40 m. A seaward bias of the SDS is always present because all drivers of inaccuracy introduce a seaward shift. Surprisingly, the pansharpening method, which is intended to increase the image pixel resolution, reduces the accuracy by about a pixel at a sandy shoreline. These inaccuracies can largely be overcome by creating composite images with a moving average time window, which results in a continuous dataset with subpixel precision (10–30 m, depending on the satellite mission).
The capability of satellite RS was evaluated in [237] to resolve at differing temporal scales the variability and trends in sandy shoreline positions. The authors combined Landsat 5/7/8 and Sentinel-2 image datasets to extract time-series of shoreline change at five long-term monitoring sites across three continents. The images were first preprocessed by applying panchromatic image sharpening and down-sampling. The sub-pixel shoreline extraction algorithm consisted of three steps: (1) image classification by a NN classifier into the four classes of ‘sand’, ‘water’, ‘white-water’ and ‘other land features’; (2) sub-pixel resolution border segmentation with the aid of histogram thresholding for MNDWI; and (3) tidal correction. The observed typical horizontal errors varied between a RMSE of 7.3 m and 12.7 m, indicating that pixel size is not the main source of error when extracting instantaneous shorelines from satellite imagery. The application of semi-variogram analysis revealed that the presently available satellite imagery can be used to resolve typical shoreline variability of around 6 months and longer. Event-scale shoreline changes (e.g., rapid storm-induced shoreline retreat and a major sand nourishment) may also be captured.
Using Landsat images on the GEE platform, a method was proposed in [238] to map continuous changes in coastlines and tidal flats in the Zhoushan Archipelago during 1985–2017. The workflow flow consists of (1) building the full time series of MNDWI at the pixel level, (2) performing a temporal segmentation using a binary segmentation algorithm and deriving the corresponding temporal segments, (3) classifying the coastal cover types (i.e., water, tidal flats, and land) in each temporal segment based on the features of MNDWI and regional tidal heights, (4) detecting the change information including conversion types, turning years and months. The spatial and temporal validation was implemented based on the visual interpretation of Landsat images. Three major coastal change types are found, including land reclamation, aquaculture expansion, and assertion of tidal flats; the land reclamation was the dominant coastal change.

Appendix C.17. Textual Summaries for Bathymetric Mapping

To extend bathymetry maps, researchers [239] have paired field observations of coastal depths with RS imagery to train models that can then predict in areas where no depth information is available. The authors trained four different multiple linear regression models on sonar from field data collection and optical RS imagery to map bathymetric depths in three different locations near Greece. They got good results with a very simple, intuitive model. Still, the best performing regression model suffered from both a slight under- and over- estimation depending on the region, meaning that more field observations should be included in more locations to capture more of the natural variance in this domain. While there were crowdsourced bathymetry datasets being collected, they were not publicly available and so could not be used in this analysis. Even if they were available, though, the authors note that they would likely run into the limits of GEE’s compute given that authors working with a large number of field samples often do. In the future, the authors called for more domain-specific methods to be implemented on the platform and for a fused Sentinel-1 SAR, Sentinel-2 optical, and DEM dataset to be uploaded to GEE, which would be useful to a wide-variety of researchers, not just those studying bathymetry mapping.
The authors in [240] used airborne LiDAR, sonar, and Landsat data to estimate bathymetry in Japan, Puerto Rico, the USA, and Vanuatu, using a RF model. Because GEE only allows for so much data to be uploaded and analyzed at any one time, the RF model was prone to overfitting. In the end, the authors’ results did not meet the standards that would allow the data to be used in practice. For that, the authors note that airborne LiDAR and sonar data would need to be combined with higher-resolution RS data like Sentinel or WorldView.

Appendix C.18. Textual Summaries for Ice and Snow

To track the changes in the cryosphere in Alaska, the authors in [241] used a CART model to map stable snow areas versus snow-loss areas for the snowfields over a wide area. Over a 19-year period, the authors found that the total area of snowfields in their region of analysis decreased by 13 km2 and that an additional 48 km2 transitioned from stable snow fields to ablation zones. However, [241] noted that their automated approach classified both new snow loss and seasonal snow as the same class, so their classification results were an overestimation. Thus, future work for mapping perennial snow loss could focus on the separation of these similar classes. The authors shared their code on GEE so that other researchers interested in replicating their study or in using parts of their code for their own analyses can easily do so.
The authors in [242] used National Oceanic and Atmospheric Administration (NOAA), Advanced Very High Resolution Radiometer (AVHRR), MOD09GQ surface reflectance products, and Landsat surface reflectance Tier 1 products to study LIP in Qinghai Lake. The threshold method was used to extract the lake ice area, with the threshold variables being set by the red-light reflectance value and the difference between the red-light reflectance and the near-infrared reflectance. The freeze-up start date was defined as the time point when the lake ice area was continuously greater than or equal to 10% of the lake area. When the lake ice area was greater than or equal to 90% of the lake area, the date of this day was determined as freeze-up end. If the lake ice area was stable at less than or equal to 90% of the lake area, the date of this day was determined as break-up start, while breakup end was defined as the time point when the ice was less than or equal to 10% of the total cover. The presence of clouds and crushed ice may cause some errors in the results obtained from different data sources.

References

  1. Yang, L.; MacEachren, A.M.; Mitra, P.; Onorati, T. Visually-Enabled Active Deep Learning for (Geo) Text and Image Classification: A Review. ISPRS Int. J. Geo-Inf. 2018, 7, 65. [Google Scholar] [CrossRef] [Green Version]
  2. Sebestyén, V.; Czvetkó, T.; Abonyi, J. The Applicability of Big Data in Climate Change Research: The Importance of System of Systems Thinking. Front. Environ. Sci. 2021, 9, 619092. [Google Scholar] [CrossRef]
  3. Li, Z. Geospatial Big Data Handling with High Performance Computing: Current Approaches and Future Directions. In High Performance Computing for Geospatial Applications; Tang, W., Wang, S., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 53–76. ISBN 9783030479985. [Google Scholar]
  4. Lee, J.-G.; Kang, M. Geospatial Big Data: Challenges and Opportunities. Big Data Res. 2015, 2, 74–81. [Google Scholar] [CrossRef]
  5. Lippitt, C.D.; Zhang, S. The impact of small unmanned airborne platforms on passive optical remote sensing: A conceptual perspective. Int. J. Remote Sens. 2018, 39, 4852–4868. [Google Scholar] [CrossRef]
  6. Zhen, L.I.U.; Huadong, G.U.O.; Wang, C. Considerations on Geospatial Big Data. IOP Conf. Ser. Earth Environ. Sci. 2016, 46, 012058. [Google Scholar]
  7. Karimi, H.A. Big Data: Techniques and Technologies in Geoinformatics; CRC Press: Boca Raton, FL, USA, 2014; ISBN 9781466586512. [Google Scholar]
  8. Marr, B. Big Data: Using SMART Big Data, Analytics and Metrics to Make Better Decisions and Improve Performance; John Wiley & Sons: Hoboken, NJ, USA, 2015; ISBN 9781118965825. [Google Scholar]
  9. Deng, X.; Liu, P.; Liu, X.; Wang, R.; Zhang, Y.; He, J.; Yao, Y. Geospatial Big Data: New Paradigm of Remote Sensing Applications. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 3841–3851. [Google Scholar] [CrossRef]
  10. Kashyap, R. Geospatial Big Data, Analytics and IoT: Challenges, Applications and Potential. In Cloud Computing for Geospatial Big Data Analytics: Intelligent Edge, Fog and Mist Computing; Das, H., Barik, R.K., Dubey, H., Roy, D.S., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 191–213. ISBN 9783030033590. [Google Scholar]
  11. Yang, C.; Yu, M.; Hu, F.; Jiang, Y.; Li, Y. Utilizing Cloud Computing to address big geospatial data challenges. Comput. Environ. Urban Syst. 2017, 61, 120–128. [Google Scholar] [CrossRef] [Green Version]
  12. Liu, Y.; Dang, L.; Li, S.; Cai, K.; Zuo, X. Research Progress on Models, Algorithms, and Systems for Remote Sensing Spatial-Temporal Big Data Processing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5918–5931. [Google Scholar] [CrossRef]
  13. Liu, P.; Di, L.; Du, Q.; Wang, L. Remote Sensing Big Data: Theory, Methods and Applications. Remote Sens. 2018, 10, 711. [Google Scholar] [CrossRef] [Green Version]
  14. Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
  15. Wang, Y.; Ziv, G.; Adami, M.; Mitchard, E.; Batterman, S.A.; Buermann, W.; Marimon, B.S.; Junior, B.H.M.; Reis, S.M.; Rodrigues, D.; et al. Mapping tropical disturbed forests using multi-decadal 30 m optical satellite imagery. Remote Sens. Environ. 2018, 221, 474–488. [Google Scholar] [CrossRef]
  16. Teluguntla, P.; Thenkabail, P.S.; Oliphant, A.; Xiong, J.; Gumma, M.K.; Congalton, R.G.; Yadav, K.; Huete, A. A 30-m landsat-derived cropland extent product of Australia and China using random forest machine learning algorithm on Google Earth Engine cloud computing platform. ISPRS J. Photogramm. Remote Sens. 2018, 144, 325–340. [Google Scholar] [CrossRef]
  17. Amani, M.; Brisco, B.; Afshar, M.; Mirmazloumi, S.M.; Mahdavi, S.; Mirzadeh, S.M.J.; Huang, W.; Granger, J. A generalized supervised classification scheme to produce provincial wetland inventory maps: An application of Google Earth Engine for big geo data processing. Big Earth Data 2019, 3, 378–394. [Google Scholar] [CrossRef]
  18. Kumar, L.; Mutanga, O. Google Earth Engine Applications Since Inception: Usage, Trends, and Potential. Remote Sens. 2018, 10, 1509. [Google Scholar] [CrossRef] [Green Version]
  19. Samasse, K.; Hanan, N.P.; Anchang, J.Y.; Diallo, Y. A High-Resolution Cropland Map for the West African Sahel Based on High-Density Training Data, Google Earth Engine, and Locally Optimized Machine Learning. Remote Sens. 2020, 12, 1436. [Google Scholar] [CrossRef]
  20. Lippitt, C.D.; Stow, D.A.; Clarke, K.C. On the nature of models for time-sensitive remote sensing. Int. J. Remote Sens. 2014, 35, 6815–6841. [Google Scholar] [CrossRef]
  21. Zhou, B.; Okin, G.S.; Zhang, J. Leveraging Google Earth Engine (GEE) and machine learning algorithms to incorporate in situ measurement from different times for rangelands monitoring. Remote Sens. Environ. 2020, 236, 111521. [Google Scholar] [CrossRef]
  22. Sayad, Y.O.; Mousannif, H.; Al Moatassime, H. Predictive modeling of wildfires: A new dataset and machine learning approach. Fire Saf. J. 2019, 104, 130–146. [Google Scholar] [CrossRef]
  23. Esteva, A.; Robicquet, A.; Ramsundar, B.; Kuleshov, V.; Depristo, M.; Chou, K.; Cui, C.; Corrado, G.; Thrun, S.; Dean, J. A guide to deep learning in healthcare. Nat. Med. 2019, 25, 24–29. [Google Scholar] [CrossRef]
  24. Davenport, T.; Kalakota, R. The potential for artificial intelligence in healthcare. Future Health J. 2019, 6, 94–98. [Google Scholar] [CrossRef] [Green Version]
  25. Mittal, S.; Hasija, Y. Applications of Deep Learning in Healthcare and Biomedicine. In Deep Learning Techniques for Biomedical and Health Informatics; Dash, S., Acharya, B.R., Mittal, M., Abraham, A., Kelemen, A., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 57–77. ISBN 9783030339661. [Google Scholar]
  26. Boulos, M.N.K.; Peng, G.; VoPham, T. An overview of GeoAI applications in health and healthcare. Int. J. Health Geogr. 2019, 18, 7. [Google Scholar] [CrossRef] [PubMed]
  27. Amani, M.; Ghorbanian, A.; Ahmadi, S.A.; Kakooei, M.; Moghimi, A.; Mirmazloumi, S.M.; Moghaddam, S.H.A.; Mahdavi, S.; Ghahremanloo, M.; Parsian, S.; et al. Google Earth Engine Cloud Computing Platform for Remote Sensing Big Data Applications: A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5326–5350. [Google Scholar] [CrossRef]
  28. Wang, L.; Diao, C.; Xian, G.; Yin, D.; Lu, Y.; Zou, S.; Erickson, T.A. A summary of the special issue on remote sensing of land change science with Google earth engine. Remote Sens. Environ. 2020, 248, 112002. [Google Scholar] [CrossRef]
  29. Tamiminia, H.; Salehi, B.; Mahdianpari, M.; Quackenbush, L.; Adeli, S.; Brisco, B. Google Earth Engine for geo-big data applications: A meta-analysis and systematic review. ISPRS J. Photogramm. Remote Sens. 2020, 164, 152–170. [Google Scholar] [CrossRef]
  30. Hoeser, T.; Kuenzer, C. Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review-Part I: Evolution and Recent Trends. Remote Sens. 2020, 12, 1667. [Google Scholar] [CrossRef]
  31. Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.-S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef] [Green Version]
  32. Pekel, J.-F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-resolution mapping of global surface water and its long-term changes. Nature 2016, 540, 418–422. [Google Scholar] [CrossRef]
  33. Decuyper, M.; Chávez, R.O.; Lohbeck, M.; Lastra, J.A.; Tsendbazar, N.; Hackländer, J.; Herold, M.; Vågen, T.-G. Continuous monitoring of forest change dynamics with satellite time series. Remote Sens. Environ. 2021, 269, 112829. [Google Scholar] [CrossRef]
  34. Guo, H.-D.; Zhang, L.; Zhu, L.-W. Earth observation big data for climate change research. Adv. Clim. Chang. Res. 2015, 6, 108–117. [Google Scholar] [CrossRef]
  35. Hird, J.N.; DeLancey, E.R.; McDermid, G.J.; Kariyeva, J. Google Earth Engine, Open-Access Satellite Data, and Machine Learning in Support of Large-Area Probabilistic Wetland Mapping. Remote Sens. 2017, 9, 1315. [Google Scholar] [CrossRef] [Green Version]
  36. Hsu, A.; Khoo, W.; Goyal, N.; Wainstein, M. Next-Generation Digital Ecosystem for Climate Data Mining and Knowledge Discovery: A Review of Digital Data Collection Technologies. Front. Big Data 2020, 3, 29. [Google Scholar] [CrossRef] [PubMed]
  37. Google Earth Engine. A Planetary-Scale Platform for Earth Science & Data Analysis. Available online: https://earthengine.google.com/ (accessed on 19 November 2019).
  38. National Aeronautics and Space Administration (NASA). Welcome to the NASA Earth Exchange (NEX). Available online: https://www.nasa.gov/nex (accessed on 23 April 2022).
  39. National Aeronautics and Space Administration (NASA). Geostationary-NASA Earth Exchange (GeoNEX). Available online: https://www.nasa.gov/geonex (accessed on 23 April 2022).
  40. Earth on AWS. Available online: https://aws.amazon.com/earth/ (accessed on 10 July 2019).
  41. Chandrashekar, S. Announcing Real-Time Geospatial Analytics in Azure Stream Analytics. Available online: https://azure.microsoft.com/en-us/blog/announcing-real-time-geospatial-analytics-in-azure-stream-analytics/ (accessed on 23 April 2022).
  42. Microsoft. Microsoft Planetary Computer. Available online: https://planetarycomputer.microsoft.com/ (accessed on 23 April 2022).
  43. Parente, L.; Taquary, E.; Silva, A.P.; Souza, C.; Ferreira, L. Next Generation Mapping: Combining Deep Learning, Cloud Computing, and Big Remote Sensing Data. Remote Sens. 2019, 11, 2881. [Google Scholar] [CrossRef] [Green Version]
  44. Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, B.A. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [Google Scholar] [CrossRef]
  45. Lobell, D.B.; Thau, D.; Seifert, C.; Engle, E.; Little, B. A scalable satellite-based crop yield mapper. Remote Sens. Environ. 2015, 164, 324–333. [Google Scholar] [CrossRef]
  46. Shelestov, A.; Lavreniuk, M.; Kussul, N.; Novikov, A.; Skakun, S. Exploring Google Earth Engine Platform for Big Data Processing: Classification of Multi-Temporal Satellite Imagery for Crop Mapping. Front. Earth Sci. 2017, 5, 17. [Google Scholar] [CrossRef] [Green Version]
  47. Xiong, J.; Thenkabail, P.S.; Tilton, J.C.; Gumma, M.K.; Teluguntla, P.; Oliphant, A.; Congalton, R.G.; Yadav, K.; Gorelick, N. Nominal 30-m cropland extent map of continental Africa by integrating pixel-based and object-based algorithms using Sentinel-2 and Landsat-8 data on Google Earth Engine. Remote Sens. 2017, 9, 1065. [Google Scholar] [CrossRef] [Green Version]
  48. Xiong, J.; Thenkabail, P.S.; Gumma, M.K.; Teluguntla, P.; Poehnelt, J.; Congalton, R.G.; Yadav, K.; Thau, D. Automated cropland mapping of continental Africa using Google Earth Engine cloud computing. ISPRS J. Photogramm. Remote Sens. 2017, 126, 225–244. [Google Scholar] [CrossRef] [Green Version]
  49. Deines, J.M.; Kendall, A.D.; Hyndman, D.W. Annual Irrigation Dynamics in the U.S. Northern High Plains Derived from Landsat Satellite Data. Geophys. Res. Lett. 2017, 44, 9350–9360. [Google Scholar] [CrossRef]
  50. Kelley, L.C.; Pitcher, L.; Bacon, C. Using Google Earth Engine to Map Complex Shade-Grown Coffee Landscapes in Northern Nicaragua. Remote Sens. 2018, 10, 952. [Google Scholar] [CrossRef] [Green Version]
  51. Ragettli, S.; Herberz, T.; Siegfried, T. An Unsupervised Classification Algorithm for Multi-Temporal Irrigated Area Mapping in Central Asia. Remote Sens. 2018, 10, 1823. [Google Scholar] [CrossRef] [Green Version]
  52. Ghazaryan, G.; Dubovyk, O.; Löw, F.; Lavreniuk, M.; Kolotii, A.; Schellberg, J.; Kussul, N. A rule-based approach for crop identification using multi-temporal and multi-sensor phenological metrics. Eur. J. Remote Sens. 2018, 51, 511–524. [Google Scholar] [CrossRef]
  53. Mandal, D.; Kumar, V.; Bhattacharya, A.; Rao, Y.S.; Siqueira, P.; Bera, S. Sen4Rice: A Processing Chain for Differentiating Early and Late Transplanted Rice Using Time-Series Sentinel-1 SAR Data with Google Earth Engine. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1947–1951. [Google Scholar] [CrossRef]
  54. Oliphant, A.J.; Thenkabail, P.S.; Teluguntla, P.; Xiong, J.; Gumma, M.K.; Congalton, R.G.; Yadav, K. Mapping cropland extent of Southeast and Northeast Asia using multi-year time-series Landsat 30-m data using a random forest classifier on the Google Earth Engine cloud. Int. J. App. Earth Observ. Geoinf. 2019, 81, 110–124. [Google Scholar] [CrossRef]
  55. Sun, J.; Di, L.; Sun, Z.; Shen, Y.; Lai, Z. County-Level Soybean Yield Prediction Using Deep CNN-LSTM Model. Sensors 2019, 19, 4363. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  56. Wang, M.; Liu, Z.; Baig, M.H.A.; Wang, Y.; Li, Y.; Chen, Y. Mapping sugarcane in complex landscapes by integrating multi-temporal Sentinel-2 images and machine learning algorithms. Land Use Policy 2019, 88, 104190. [Google Scholar] [CrossRef]
  57. Tian, F.; Wu, B.; Zeng, H.; Zhang, X.; Xu, J. Efficient Identification of Corn Cultivation Area with Multitemporal Synthetic Aperture Radar and Optical Images in the Google Earth Engine Cloud Platform. Remote Sens. 2019, 11, 629. [Google Scholar] [CrossRef] [Green Version]
  58. Xie, Y.; Lark, T.J.; Brown, J.F.; Gibbs, H.K. Mapping irrigated cropland extent across the conterminous United States at 30 m resolution using a semi-automatic training approach on Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2019, 155, 136–149. [Google Scholar] [CrossRef]
  59. Jin, Z.; Azzari, G.; You, C.; Di Tommaso, S.; Aston, S.; Burke, M.; Lobell, D.B. Smallholder maize area and yield mapping at national scales with Google Earth Engine. Remote Sens. Environ. 2019, 228, 115–128. [Google Scholar] [CrossRef]
  60. Rudiyanto; Minasny, B.; Shah, R.M.; Che Soh, N.; Arif, C.; Indra Setiawan, B.; Rudiyanto Minasny, B. Automated Near-Real-Time Mapping and Monitoring of Rice Extent, Cropping Patterns, and Growth Stages in Southeast Asia Using Sentinel-1 Time Series on a Google Earth Engine Platform. Remote Sens. 2019, 11, 1666. [Google Scholar] [CrossRef] [Green Version]
  61. Wang, S.; Azzari, G.; Lobell, D.B. Crop type mapping without field-level labels: Random forest transfer and unsupervised clustering techniques. Remote Sens. Environ. 2019, 222, 303–317. [Google Scholar] [CrossRef]
  62. Liang, L.; Runkle, B.R.K.; Sapkota, B.B.; Reba, M.L. Automated mapping of rice fields using multi-year training sample normalization. Int. J. Remote Sens. 2019, 40, 7252–7271. [Google Scholar] [CrossRef]
  63. Tian, H.F.; Huang, N.; Niu, Z.; Qin, Y.C.; Pei, J.; Wang, J. Mapping Winter Crops in China with Multi-Source Satellite Imagery and Phenology-Based Algorithm. Remote Sens. 2019, 11, 820. [Google Scholar] [CrossRef] [Green Version]
  64. Neetu; Ray, S.S. Exploring machine learning classification algorithms for crop classification using sentinel 2 data. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2019, XLII-3/W6, 573–578. [Google Scholar] [CrossRef] [Green Version]
  65. Gumma, M.K.; Thenkabail, P.S.; Teluguntla, P.G.; Oliphant, A.; Xiong, J.; Giri, C.; Pyla, V.; Dixit, S.; Whitbread, A.M. Agricultural cropland extent and areas of South Asia derived using Landsat satellite 30-m time-series big-data using random forest machine learning algorithms on the Google Earth Engine cloud. GISci. Remote Sens. 2019, 57, 302–322. [Google Scholar] [CrossRef] [Green Version]
  66. Han, J.; Zhang, Z.; Cao, J.; Luo, Y.; Zhang, L.; Li, Z.; Zhang, J. Prediction of Winter Wheat Yield Based on Multi-Source Data and Machine Learning in China. Remote Sens. 2020, 12, 236. [Google Scholar] [CrossRef] [Green Version]
  67. Phalke, A.R.; Özdoğan, M.; Thenkabail, P.S.; Erickson, T.; Gorelick, N.; Yadav, K.; Congalton, R.G. Mapping Croplands of Europe, Middle East, Russia, and Central Asia Using Landsat, Random Forest, and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2020, 167, 104–122. [Google Scholar] [CrossRef]
  68. Chen, N.; Yu, L.; Zhang, X.; Shen, Y.; Zeng, L.; Hu, Q.; Niyogi, D. Mapping Paddy Rice Fields by Combining Multi-Temporal Vegetation Index and Synthetic Aperture Radar Remote Sensing Data Using Google Earth Engine Machine Learning Platform. Remote Sens. 2020, 12, 2992. [Google Scholar] [CrossRef]
  69. Amani, M.; Kakooei, M.; Moghimi, A.; Ghorbanian, A.; Ranjgar, B.; Mahdavi, S.; Davidson, A.; Fisette, T.; Rollin, P.; Brisco, B.; et al. Application of Google Earth Engine Cloud Computing Platform, Sentinel Imagery, and Neural Networks for Crop Mapping in Canada. Remote Sens. 2020, 12, 3561. [Google Scholar] [CrossRef]
  70. You, N.; Dong, J. Examining Earliest Identifiable Timing of Crops Using All Available Sentinel 1/2 Imagery and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2020, 161, 109–123. [Google Scholar]
  71. Poortinga, A.; Thwal, N.S.; Khanal, N.; Mayer, T.; Bhandari, B.; Markert, K.; Nicolau, A.P.; Dilger, J.; Tenneson, K.; Clinton, N.; et al. Mapping sugarcane in Thailand using transfer learning, a lightweight convolutional neural network, NICFI high resolution satellite imagery and Google Earth Engine. ISPRS Open J. Photogramm. Remote Sens. 2021, 1, 100003. [Google Scholar] [CrossRef]
  72. Adrian, J.; Sagan, V.; Maimaitijiang, M. Sentinel SAR-optical fusion for crop type mapping using deep learning and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2021, 175, 215–235. [Google Scholar] [CrossRef]
  73. Cao, J.; Zhang, Z.; Luo, Y.; Zhang, L.; Zhang, J.; Li, Z.; Tao, F. Wheat yield predictions at a county and field scale with deep learning, machine learning, and google earth engine. Eur. J. Agron. 2020, 123, 126204. [Google Scholar] [CrossRef]
  74. Luo, C.; Qi, B.; Liu, H.; Guo, D.; Lu, L.; Fu, Q.; Shao, Y. Using Time Series Sentinel-1 Images for Object-Oriented Crop Classification in Google Earth Engine. Remote Sens. 2021, 13, 561. [Google Scholar] [CrossRef]
  75. Ni, R.; Tian, J.; Li, X.; Yin, D.; Li, J.; Gong, H.; Zhang, J.; Zhu, L.; Wu, D. An enhanced pixel-based phenological feature for accurate paddy rice mapping with Sentinel-2 imagery in Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2021, 178, 282–296. [Google Scholar] [CrossRef]
  76. Sun, Y.; Qin, Q.; Ren, H.; Zhang, Y. Decameter Cropland LAI/FPAR Estimation from Sentinel-2 Imagery Using Google Earth Engine. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
  77. Li, M.; Zhang, R.; Luo, H.; Gu, S.; Qin, Z. Crop Mapping in the Sanjiang Plain Using an Improved Object-Oriented Method Based on Google Earth Engine and Combined Growth Period Attributes. Remote Sens. 2022, 14, 273. [Google Scholar] [CrossRef]
  78. Han, L.; Ding, J.; Wang, J.; Zhang, J.; Xie, B.; Hao, J. Monitoring Oasis Cotton Fields Expansion in Arid Zones Using the Google Earth Engine: A Case Study in the Ogan-Kucha River Oasis, Xinjiang, China. Remote Sens. 2022, 14, 225. [Google Scholar] [CrossRef]
  79. Hedayati, A.; Vahidnia, M.H.; Behzadi, S. Paddy lands detection using Landsat-8 satellite images and object-based classification in Rasht city, Iran. Egypt. J. Remote Sens. Space Sci. 2022, 25, 73–84. [Google Scholar] [CrossRef]
  80. Azzari, G.; Lobell, D. Landsat-based classification in the cloud: An opportunity for a paradigm shift in land cover monitoring. Remote Sens. Environ. 2017, 202, 64–74. [Google Scholar] [CrossRef]
  81. Midekisa, A.; Holl, F.; Savory, D.J.; Andrade-Pacheco, R.; Gething, P.; Bennett, A.; Sturrock, H. Mapping land cover change over continental Africa using Landsat and Google Earth Engine cloud computing. PLoS ONE 2017, 12, e0184926. [Google Scholar] [CrossRef]
  82. Hu, Y.; Dong, Y. Batunacun An Automatic Approach for Land-Change Detection and Land Updates Based on Integrated NDVI Timing Analysis and the CVAPS Method with GEE Support. ISPRS J. Photogramm. Remote Sens. 2018, 146, 347–359. [Google Scholar] [CrossRef]
  83. Ge, Y.; Hu, S.; Ren, Z.; Jia, Y.; Wang, J.; Liu, M.; Zhang, D.; Zhao, W.; Luo, Y.; Fu, Y.; et al. Mapping annual land use changes in China’s poverty-stricken areas from 2013 to 2018. Remote Sens. Environ. 2019, 232, 111285. [Google Scholar] [CrossRef]
  84. Lee, J.; Cardille, J.A.; Coe, M.T. BULC-U: Sharpening Resolution and Improving Accuracy of Land-Use/Land-Cover Classifications in Google Earth Engine. Remote Sens. 2018, 10, 1455. [Google Scholar] [CrossRef] [Green Version]
  85. Zurqani, H.A.; Post, C.J.; Mikhailova, E.A.; Schlautman, M.A.; Sharp, J.L. Geospatial analysis of land use change in the Savannah River Basin using Google Earth Engine. Int. J. Appl. Earth Obs. Geoinf. ITC J. 2018, 69, 175–185. [Google Scholar] [CrossRef]
  86. Murray, N.J.; Keith, D.A.; Simpson, D.; Wilshire, J.H.; Lucas, R.M. Remap: An online remote sensing application for land cover classification and monitoring. Methods Ecol. Evol. 2018, 9, 2019–2027. [Google Scholar] [CrossRef] [Green Version]
  87. Mardani, M.; Mardani, H.; De Simone, L.; Varas, S.; Kita, N.; Saito, T. Integration of Machine Learning and Open Access Geospatial Data for Land Cover Mapping. Remote Sens. 2019, 11, 1907. [Google Scholar] [CrossRef] [Green Version]
  88. Gong, P.; Liu, H.; Zhang, M.; Li, C.; Wang, J.; Huang, H.; Clinton, N.; Ji, L.; Li, W.; Bai, Y.; et al. Stable classification with limited sample: Transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Sci. Bull. 2019, 64, 370–373. [Google Scholar] [CrossRef] [Green Version]
  89. Hao, B.; Ma, M.; Li, S.; Li, Q.; Hao, D.; Huang, J.; Ge, Z.; Yang, H.; Han, X. Land Use Change and Climate Variation in the Three Gorges Reservoir Catchment from 2000 to 2015 Based on the Google Earth Engine. Sensors 2019, 19, 2118. [Google Scholar] [CrossRef] [Green Version]
  90. Miettinen, J.; Shi, C.; Liew, S.C. Towards automated 10–30 m resolution land cover mapping in insular South-East Asia. Geocarto Int. 2017, 34, 443–457. [Google Scholar] [CrossRef]
  91. Xie, S.; Liu, L.; Zhang, X.; Yang, J.; Chen, X.; Gao, Y. Automatic Land-Cover Mapping using Landsat Time-Series Data based on Google Earth Engine. Remote Sens. 2019, 11, 3023. [Google Scholar] [CrossRef] [Green Version]
  92. Adepoju, K.A.; Adelabu, S.A. Improving accuracy of Landsat-8 OLI classification using image composite and multisource data with Google Earth Engine. Remote Sens. Lett. 2019, 11, 107–116. [Google Scholar] [CrossRef]
  93. Ghorbanian, A.; Kakooei, M.; Amani, M.; Mahdavi, S.; Mohammadzadeh, A.; Hasanlou, M. Improved land cover map of Iran using Sentinel imagery within Google Earth Engine and a novel automatic workflow for land cover classification using migrated training samples. ISPRS J. Photogramm. Remote. Sens. 2020, 167, 276–288. [Google Scholar] [CrossRef]
  94. Liang, J.; Xie, Y.; Sha, Z.; Zhou, A. Modeling urban growth sustainability in the cloud by augmenting Google Earth Engine (GEE). Comput. Environ. Urban Syst. 2020, 84, 101542. [Google Scholar] [CrossRef]
  95. Zeng, H.; Wu, B.; Wang, S.; Musakwa, W.; Tian, F.; Mashimbye, Z.E.; Poona, N.; Syndey, M. A Synthesizing Land-cover Classification Method Based on Google Earth Engine: A Case Study in Nzhelele and Levhuvu Catchments, South Africa. Chin. Geogr. Sci. 2020, 30, 397–409. [Google Scholar] [CrossRef]
  96. Naboureh, A.; Li, A.; Bian, J.; Lei, G.; Amani, M. A Hybrid Data Balancing Method for Classification of Imbalanced Training Data within Google Earth Engine: Case Studies from Mountainous Regions. Remote Sens. 2020, 12, 3301. [Google Scholar] [CrossRef]
  97. Naboureh, A.; Ebrahimy, H.; Azadbakht, M.; Bian, J.; Amani, M. RUESVMs: An Ensemble Method to Handle the Class Imbalance Problem in Land Cover Mapping Using Google Earth Engine. Remote Sens. 2020, 12, 3484. [Google Scholar] [CrossRef]
  98. Li, Q.; Qiu, C.; Ma, L.; Schmitt, M.; Zhu, X.X. Mapping the Land Cover of Africa at 10 m Resolution from Multi-Source Remote Sensing Data with Google Earth Engine. Remote Sens. 2020, 12, 602. [Google Scholar] [CrossRef] [Green Version]
  99. Huang, H.; Wang, J.; Liu, C.; Liang, L.; Li, C.; Gong, P. The migration of training samples towards dynamic global land cover mapping. ISPRS J. Photogramm. Remote Sens. 2020, 161, 27–36. [Google Scholar] [CrossRef]
  100. Tassi, A.; Vizzari, M. Object-Oriented LULC Classification in Google Earth Engine Combining SNIC, GLCM, and Machine Learning Algorithms. Remote Sens. 2020, 12, 3776. [Google Scholar] [CrossRef]
  101. Shetty, S.; Gupta, P.; Belgiu, M.; Srivastav, S. Assessing the Effect of Training Sampling Design on the Performance of Machine Learning Classifiers for Land Cover Mapping Using Multi-Temporal Remote Sensing Data and Google Earth Engine. Remote Sens. 2021, 13, 1433. [Google Scholar] [CrossRef]
  102. Feizizadeh, B.; Omarzadeh, D.; Garajeh, M.K.; Lakes, T.; Blaschke, T. Machine learning data-driven approaches for land use/cover mapping and trend analysis using Google Earth Engine. J. Environ. Plan. Manag. 2021, 1–33. [Google Scholar] [CrossRef]
  103. Shafizadeh-Moghadam, H.; Khazaei, M.; Alavipanah, S.K.; Weng, Q. Google Earth Engine for large-scale land use and land cover mapping: An object-based classification approach using spectral, textural and topographical factors. GISci. Remote Sens. 2021, 58, 914–928. [Google Scholar] [CrossRef]
  104. Pan, X.; Wang, Z.; Gao, Y.; Dang, X.; Han, Y. Detailed and automated classification of land use/land cover using machine learning algorithms in Google Earth Engine. Geocarto Int. 2021, 1–18. [Google Scholar] [CrossRef]
  105. Becker, W.R.; Ló, T.B.; Johann, J.A.; Mercante, E. Statistical features for land use and land cover classification in Google Earth Engine. Remote Sens. Appl. Soc. Environ. 2020, 21, 100459. [Google Scholar] [CrossRef]
  106. Jin, Q.; Xu, E.; Zhang, X. A Fusion Method for Multisource Land Cover Products Based on Superpixels and Statistical Extraction for Enhancing Resolution and Improving Accuracy. Remote Sens. 2022, 14, 1676. [Google Scholar] [CrossRef]
  107. Lee, J.S.H.; Wich, S.; Widayati, A.; Koh, L.P. Detecting industrial oil palm plantations on Landsat images with Google Earth Engine. Remote Sens. Appl. Soc. Environ. 2016, 4, 219–224. [Google Scholar] [CrossRef] [Green Version]
  108. Voight, C.; Hernandez-Aguilar, K.; Garcia, C.; Gutierrez, S. Predictive Modeling of Future Forest Cover Change Patterns in Southern Belize. Remote Sens. 2019, 11, 823. [Google Scholar] [CrossRef] [Green Version]
  109. Koskinen, J.; Leinonen, U.; Vollrath, A.; Ortmann, A.; Lindquist, E.; D’Annunzio, R.; Pekkarinen, A.; Käyhkö, N. Participatory mapping of forest plantations with Open Foris and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2018, 148, 63–74. [Google Scholar] [CrossRef]
  110. Duan, Q.; Tan, M.; Guo, Y.; Wang, X.; Xin, L. Understanding the Spatial Distribution of Urban Forests in China Using Sentinel-2 Images with Google Earth Engine. Forests 2019, 10, 729. [Google Scholar] [CrossRef] [Green Version]
  111. Poortinga, A.; Tenneson, K.; Shapiro, A.; Nquyen, Q.; Aung, K.S.; Chishtie, F.; Saah, D. Mapping Plantations in Myanmar by Fusing Landsat-8, Sentinel-2 and Sentinel-1 Data along with Systematic Error Quantification. Remote Sens. 2019, 11, 831. [Google Scholar] [CrossRef] [Green Version]
  112. Shimizu, K.; Ota, T.; Mizoue, N. Detecting Forest Changes Using Dense Landsat 8 and Sentinel-1 Time Series Data in Tropical Seasonal Forests. Remote Sens. 2019, 11, 1899. [Google Scholar] [CrossRef] [Green Version]
  113. Ramdani, F. Recent expansion of oil palm plantation in the most eastern part of Indonesia: Feature extraction with polarimetric SAR. Int. J. Remote Sens. 2018, 40, 7371–7388. [Google Scholar] [CrossRef]
  114. Çolak, E.; Chandra, M.; Sunar, F. The use of multi-temporal sentinel satellites in the analysis of land cover/land use changes caused by the nuclear power plant construction. ISPRS-Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-3/W8, 491–495. [Google Scholar] [CrossRef] [Green Version]
  115. Shaharum, N.S.N.; Shafri, H.Z.M.; Ghani, W.A.W.A.K.; Samsatli, S.; Al-Habshi, M.M.A.; Yusuf, B. Oil palm mapping over Peninsular Malaysia using Google Earth Engine and machine learning algorithms. Remote Sens. Appl. Soc. Environ. 2020, 17, 100287. [Google Scholar] [CrossRef]
  116. De Sousa, C.; Fatoyinbo, L.; Neigh, C.; Boucka, F.; Angoue, V.; Larsen, T. Cloud-computing and machine learning in support of country-level land cover and ecosystem extent mapping in Liberia and Gabon. PLoS ONE 2020, 15, e0227438. [Google Scholar] [CrossRef] [PubMed]
  117. Brovelli, M.A.; Sun, Y.; Yordanov, V. Monitoring Forest Change in the Amazon Using Multi-Temporal Remote Sensing Data and Machine Learning Classification on Google Earth Engine. ISPRS Int. J. Geo-Inf. 2020, 9, 580. [Google Scholar] [CrossRef]
  118. Kamal, M.; Farda, N.M.; Jamaluddin, I.; Parela, A.; Wikantika, K.; Prasetyo, L.B.; Irawan, B. A preliminary study on machine learning and google earth engine for mangrove mapping. IOP Conf. Series Earth Environ. Sci. 2020, 500, 012038. [Google Scholar] [CrossRef]
  119. Wei, C.; Karger, D.N.; Wilson, A.M. Spatial detection of alpine treeline ecotones in the Western United States. Remote Sens. Environ. 2020, 240, 111672. [Google Scholar] [CrossRef]
  120. Praticò, S.; Solano, F.; Di Fazio, S.; Modica, G. Machine Learning Classification of Mediterranean Forest Habitats in Google Earth Engine Based on Seasonal Sentinel-2 Time-Series and Input Image Composition Optimisation. Remote Sens. 2021, 13, 586. [Google Scholar] [CrossRef]
  121. Xie, B.; Cao, C.; Xu, M.; Duerler, R.; Yang, X.; Bashir, B.; Chen, Y.; Wang, K. Analysis of Regional Distribution of Tree Species Using Multi-Seasonal Sentinel-1&2 Imagery within Google Earth Engine. Forests 2021, 12, 565. [Google Scholar] [CrossRef]
  122. Floreano, I.X.; de Moraes, L.A.F. Land Use/land Cover (LULC) Analysis (2009–2019) with Google Earth Engine and 2030 Prediction Using Markov-CA in the Rondônia State, Brazil. Environ. Monit. Assess. 2021, 193, 239. [Google Scholar] [CrossRef]
  123. Kumar, M.; Phukon, S.N.; Paygude, A.C.; Tyagi, K.; Singh, H. Mapping Phenological Functional Types (PhFT) in the Indian Eastern Himalayas using machine learning algorithm in Google Earth Engine. Comput. Geosci. 2021, 158, 104982. [Google Scholar] [CrossRef]
  124. Zhao, F.; Sun, R.; Zhong, L.; Meng, R.; Huang, C.; Zeng, X.; Wang, M.; Li, Y.; Wang, Z. Monthly mapping of forest harvesting using dense time series Sentinel-1 SAR imagery and deep learning. Remote Sens. Environ. 2021, 269, 112822. [Google Scholar] [CrossRef]
  125. Wimberly, M.C.; Dwomoh, F.K.; Numata, I.; Mensah, F.; Amoako, J.; Nekorchuk, D.M.; McMahon, A. Historical trends of degradation, loss, and recovery in the tropical forest reserves of Ghana. Int. J. Digit. Earth 2022, 15, 30–51. [Google Scholar] [CrossRef]
  126. Johansen, K.; Phinn, S.; Taylor, M. Mapping woody vegetation clearing in Queensland, Australia from Landsat imagery using the Google Earth Engine. Remote Sens. Appl. Soc. Environ. 2015, 1, 36–49. [Google Scholar] [CrossRef]
  127. Traganos, D.; Aggarwal, B.; Poursanidis, D.; Topouzelis, K.; Chrysoulakis, N.; Reinartz, P. Towards Global-Scale Seagrass Mapping and Monitoring Using Sentinel-2 on Google Earth Engine: The Case Study of the Aegean and Ionian Seas. Remote Sens. 2018, 10, 1227. [Google Scholar] [CrossRef] [Green Version]
  128. Tsai, Y.H.; Stow, D.; Chen, H.L.; Lewison, R.; An, L.; Shi, L. Mapping Vegetation and Land Use Types in Fanjingshan National Nature Reserve Using Google Earth Engine. Remote Sens. 2018, 10, 927. [Google Scholar] [CrossRef] [Green Version]
  129. Jansen, V.S.; Kolden, C.A.; Schmalz, H.J. The Development of Near Real-Time Biomass and Cover Estimates for Adaptive Rangeland Management Using Landsat 7 and Landsat 8 Surface Reflectance Products. Remote Sens. 2018, 10, 1057. [Google Scholar] [CrossRef] [Green Version]
  130. Jones, M.O.; Allred, B.W.; Naugle, D.E.; Maestas, J.; Donnelly, P.; Metz, L.J.; Karl, J.; Smith, R.; Bestelmeyer, B.; Boyd, C.; et al. Innovation in rangeland monitoring: Annual, 30 m, plant functional type percent cover maps for U.S. rangelands, 1984–2017. Ecosphere 2018, 9, e02430. [Google Scholar] [CrossRef]
  131. Campos-Taberner, M.; Moreno-Martínez, Á.; García-Haro, F.J.; Camps-Valls, G.; Robinson, N.P.; Kattge, J.; Running, S.W. Global Estimation of Biophysical Variables from Google Earth Engine Platform. Remote Sens. 2018, 10, 1167. [Google Scholar] [CrossRef] [Green Version]
  132. Xin, Y.; Adler, P.R. Mapping Miscanthus Using Multi-Temporal Convolutional Neural Network and Google Earth Engine. In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery, Chicago, IL, USA, 5 November 2019; pp. 81–84. [Google Scholar] [CrossRef] [Green Version]
  133. Parente, L.; Mesquita, V.; Miziara, F.; Baumann, L.; Ferreira, L. Assessing the pasturelands and livestock dynamics in Brazil, from 1985 to 2017: A novel approach based on high spatial resolution imagery and Google Earth Engine cloud computing. Remote Sens. Environ. 2019, 232, 111301. [Google Scholar] [CrossRef]
  134. Zhang, M.; Gong, P.; Qi, S.; Liu, C.; Xiong, T. Mapping bamboo with regional phenological characteristics derived from dense Landsat time series using Google Earth Engine. Int. J. Remote Sens. 2019, 40, 9541–9555. [Google Scholar] [CrossRef]
  135. Alencar, A.; Shimbo, J.Z.; Lenti, F.; Balzani Marques, C.; Zimbres, B.; Rosa, M.; Arruda, V.; Castro, I.; Fernandes Márcico Ribeiro, J.P.; Varela, V.; et al. Mapping Three Decades of Changes in the Brazilian Savanna Native Vegetation Using Landsat Data Processed in the Google Earth Engine Platform. Remote Sens. 2020, 12, 924. [Google Scholar] [CrossRef] [Green Version]
  136. Tian, J.; Wang, L.; Yin, D.; Li, X.; Diao, C.; Gong, H.; Shi, C.; Menenti, M.; Ge, Y.; Nie, S.; et al. Development of spectral-phenological features for deep learning to understand Spartina alterniflora invasion. Remote Sens. Environ. 2020, 242, 111745. [Google Scholar] [CrossRef]
  137. Srinet, R.; Nandy, S.; Padalia, H.; Ghosh, S.; Watham, T.; Patel, N.R.; Chauhan, P. Mapping plant functional types in Northwest Himalayan foothills of India using random forest algorithm in Google Earth Engine. Int. J. Remote Sens. 2020, 41, 7296–7309. [Google Scholar] [CrossRef]
  138. Long, X.; Li, X.; Lin, H.; Zhang, M. Mapping the vegetation distribution and dynamics of a wetland using adaptive-stacking and Google Earth Engine based on multi-source remote sensing data. Int. J. Appl. Earth Obs. Geoinf. ITC J. 2021, 102, 102453. [Google Scholar] [CrossRef]
  139. Yan, D.; Li, J.; Yao, X.; Luan, Z. Quantifying the Long-Term Expansion and Dieback of Spartina Alterniflora Using Google Earth Engine and Object-Based Hierarchical Random Forest Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 9781–9793. [Google Scholar] [CrossRef]
  140. Wu, N.; Shi, R.; Zhuo, W.; Zhang, C.; Zhou, B.; Xia, Z.; Tao, Z.; Gao, W.; Tian, B. A Classification of Tidal Flat Wetland Vegetation Combining Phenological Features with Google Earth Engine. Remote Sens. 2021, 13, 443. [Google Scholar] [CrossRef]
  141. Pipia, L.; Amin, E.; Belda, S.; Salinero-Delgado, M.; Verrelst, J. Green LAI Mapping and Cloud Gap-Filling Using Gaussian Process Regression in Google Earth Engine. Remote Sens. 2021, 13, 403. [Google Scholar] [CrossRef]
  142. Zou, Z.; Dong, J.; Menarguez, M.A.; Xiao, X.; Qin, Y.; Doughty, R.B.; Hooker, K.V.; Hambright, K.D. Continued decrease of open surface water body area in Oklahoma during 1984–2015. Sci. Total Environ. 2017, 595, 451–460. [Google Scholar] [CrossRef]
  143. Chen, F.; Zhang, M.; Tian, B.; Li, Z. Extraction of Glacial Lake Outlines in Tibet Plateau Using Landsat 8 Imagery and Google Earth Engine. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 4002–4009. [Google Scholar] [CrossRef]
  144. Wang, C.; Jia, M.; Chen, N.; Wang, W. Long-Term Surface Water Dynamics Analysis Based on Landsat Imagery and the Google Earth Engine Platform: A Case Study in the Middle Yangtze River Basin. Remote Sens. 2018, 10, 1635. [Google Scholar] [CrossRef] [Green Version]
  145. Lin, S.; Novitski, L.N.; Qi, J.; Stevenson, R.J. Landsat TM/ETM+ and machine-learning algorithms for limnological studies and algal bloom management of inland lakes. J. Appl. Remote Sens. 2018, 12, 026003. [Google Scholar] [CrossRef]
  146. Griffin, C.G.; McClelland, J.W.; Frey, K.E.; Fiske, G.; Holmes, R.M. Quantifying CDOM and DOC in major Arctic rivers during ice-free conditions using Landsat TM and ETM+ data. Remote Sens. Environ. 2018, 209, 395–409. [Google Scholar] [CrossRef]
  147. Isikdogan, L.F.; Bovik, A.; Passalacqua, P. Seeing Through the Clouds with DeepWaterMap. IEEE Geosci. Remote Sens. Lett. 2019, 17, 1662–1666. [Google Scholar] [CrossRef]
  148. Fang, Y.; Li, H.; Wan, W.; Zhu, S.; Wang, Z.; Hong, Y.; Wang, H. Assessment of Water Storage Change in China’s Lakes and Reservoirs over the Last Three Decades. Remote Sens. 2019, 11, 1467. [Google Scholar] [CrossRef] [Green Version]
  149. Fuentes, I.; Padarian, J.; van Ogtrop, F.; Vervoort, R.W. Vervoort Comparison of Surface Water Volume Estimation Methodologies That Couple Surface Reflectance Data and Digital Terrain Models. Water 2019, 11, 780. [Google Scholar] [CrossRef] [Green Version]
  150. Markert, K.N.; Markert, A.M.; Mayer, T.; Nauman, C.; Haag, A.; Poortinga, A.; Bhandari, B.; Thwal, N.S.; Kunlamai, T.; Chishtie, F.; et al. Comparing Sentinel-1 Surface Water Mapping Algorithms and Radiometric Terrain Correction Processing in Southeast Asia Utilizing Google Earth Engine. Remote Sens. 2020, 12, 2469. [Google Scholar] [CrossRef]
  151. Wang, Y.; Li, Z.; Zeng, C.; Xia, G.; Shen, H. An Urban Water Extraction Method Combining Deep Learning and Google Earth Engine. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 768–781. [Google Scholar] [CrossRef]
  152. Peterson, K.T.; Sagan, V.; Sloan, J.J. Deep Learning-Based Water Quality Estimation and Anomaly Detection Using Landsat-8/Sentinel-2 Virtual Constellation and Cloud Computing. GISci. Remote Sens. 2020, 57, 510–525. [Google Scholar] [CrossRef]
  153. Wang, L.; Xu, M.; Liu, Y.; Liu, H.; Beck, R.; Reif, M.; Emery, E.; Young, J.; Wu, Q. Mapping Freshwater Chlorophyll-a Concentrations at a Regional Scale Integrating Multi-Sensor Satellite Observations with Google Earth Engine. Remote Sens. 2020, 12, 3278. [Google Scholar] [CrossRef]
  154. Boothroyd, R.J.; Williams, R.D.; Hoey, T.B.; Barrett, B.; Prasojo, O.A. Applications of Google Earth Engine in fluvial geomorphology for detecting river channel change. WIREs Water 2020, 8, e21496. [Google Scholar] [CrossRef]
  155. Weber, S.J.; Mishra, D.R.; Wilde, S.B.; Kramer, E. Risks for cyanobacterial harmful algal blooms due to land management and climate interactions. Sci. Total Environ. 2019, 703, 134608. [Google Scholar] [CrossRef] [PubMed]
  156. Mayer, T.; Poortinga, A.; Bhandari, B.; Nicolau, A.P.; Markert, K.; Thwal, N.S.; Markert, A.; Haag, A.; Kilbride, J.; Chishtie, F.; et al. Deep learning approach for Sentinel-1 surface water mapping leveraging Google Earth Engine. ISPRS Open J. Photogramm. Remote Sens. 2021, 2, 100005. [Google Scholar] [CrossRef]
  157. Li, J.; Peng, B.; Wei, Y.; Ye, H. Accurate extraction of surface water in complex environment based on Google Earth Engine and Sentinel-2. PLoS ONE 2021, 16, e0253209. [Google Scholar] [CrossRef]
  158. Li, Y.; Niu, Z. Systematic method for mapping fine-resolution water cover types in China based on time series Sentinel-1 and 2 images. Int. J. Appl. Earth Obs. Geoinf. ITC J. 2021, 106, 102656. [Google Scholar] [CrossRef]
  159. Farda, N.M. Multi-temporal Land Use Mapping of Coastal Wetlands Area using Machine Learning in Google Earth Engine. IOP Conf. Series Earth Environ. Sci. 2017, 98, 012042. [Google Scholar] [CrossRef]
  160. Amani, M.; Mahdavi, S.; Afshar, M.; Brisco, B.; Huang, W.; Mohammad Javad Mirzadeh, S.; White, L.; Banks, S.; Montgomery, J.; Hopkinson, C. Canadian Wetland Inventory using Google Earth Engine: The First Map and Preliminary Results. Remote Sens. 2019, 11, 842. [Google Scholar] [CrossRef] [Green Version]
  161. Mahdianpari, M.; Salehi, B.; Mohammadimanesh, F.; Homayouni, S.; Gill, E. The First Wetland Inventory Map of Newfoundland at a Spatial Resolution of 10 m Using Sentinel-1 and Sentinel-2 Data on the Google Earth Engine Cloud Computing Platform. Remote Sens. 2019, 11, 43. [Google Scholar] [CrossRef] [Green Version]
  162. DeLancey, E.R.; Kariyeva, J.; Bried, J.T.; Hird, J. Large-scale probabilistic identification of boreal peatlands using Google Earth Engine, open-access satellite data, and machine learning. PLoS ONE 2019, 14, e0218165. [Google Scholar] [CrossRef] [Green Version]
  163. Wu, Q.; Lane, C.R.; Li, X.; Zhao, K.; Zhou, Y.; Clinton, N.; DeVries, B.; Golden, H.E.; Lang, M.W. Integrating LiDAR data and multi-temporal aerial imagery to map wetland inundation dynamics using Google Earth Engine. Remote Sens. Environ. 2019, 228, 1–13. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  164. Zhang; Zhang; Dong; Liu; Gao; Hu; Wu Mapping Tidal Flats with Landsat 8 Images and Google Earth Engine: A Case Study of the China’s Eastern Coastal Zone circa 2015. Remote Sens. 2019, 11, 924. [CrossRef] [Green Version]
  165. Mahdianpari, M.; Salehi, B.; Mohammadimanesh, F.; Brisco, B.; Homayouni, S.; Gill, E.; DeLancey, E.R.; Bourgeau-Chavez, L. Big Data for a Big Country: The First Generation of Canadian Wetland Inventory Map at a Spatial Resolution of 10-m Using Sentinel-1 and Sentinel-2 Data on the Google Earth Engine Cloud Computing Platform. Can. J. Remote Sens. 2020, 46, 15–33. [Google Scholar] [CrossRef]
  166. Hakdaoui, S.; Emran, A.; Pradhan, B.; Qninba, A.; El Balla, T.; Mfondoum, A.H.N.; Lee, C.-W.; Alamri, A.M. Assessing the Changes in the Moisture/Dryness of Water Cavity Surfaces in Imlili Sebkha in Southwestern Morocco by Using Machine Learning Classification in Google Earth Engine. Remote Sens. 2020, 12, 131. [Google Scholar] [CrossRef] [Green Version]
  167. DeLancey, E.R.; Simms, J.F.; Mahdianpari, M.; Brisco, B.; Mahoney, C.; Kariyeva, J. Comparing Deep Learning and Shallow Learning for Large-Scale Wetland Classification in Alberta, Canada. Remote Sens. 2019, 12, 2. [Google Scholar] [CrossRef] [Green Version]
  168. Mahdianpari, M.; Brisco, B.; Granger, J.E.; Mohammadimanesh, F.; Salehi, B.; Banks, S.; Homayouni, S.; Bourgeau-Chavez, L.; Weng, Q. The Second Generation Canadian Wetland Inventory Map at 10 Meters Resolution Using Google Earth Engine. Can. J. Remote Sens. 2020, 46, 360–375. [Google Scholar] [CrossRef]
  169. Wang, X.; Xiao, X.; Zou, Z.; Chen, B.; Ma, J.; Dong, J.; Doughty, R.B.; Zhong, Q.; Qin, Y.; Dai, S.; et al. Tracking annual changes of coastal tidal flats in China during 1986–2016 through analyses of Landsat images with Google Earth Engine. Remote Sens. Environ. 2018, 238, 110987. [Google Scholar] [CrossRef]
  170. Mahdianpari, M.; Jafarzadeh, H.; Granger, J.E.; Mohammadimanesh, F.; Brisco, B.; Salehi, B.; Homayouni, S.; Weng, Q. A large-scale change monitoring of wetlands using time series Landsat imagery on Google Earth Engine: A case study in Newfoundland. GISci. Remote Sens. 2020, 57, 1102–1124. [Google Scholar] [CrossRef]
  171. Sahour, H.; Kemink, K.M.; O’Connell, J. Integrating SAR and Optical Remote Sensing for Conservation-Targeted Wetlands Mapping. Remote Sens. 2021, 14, 159. [Google Scholar] [CrossRef]
  172. Jia, M.; Wang, Z.; Mao, D.; Ren, C.; Wang, C.; Wang, Y. Rapid, robust, and automated mapping of tidal flats in China using time series Sentinel-2 images and Google Earth Engine. Remote Sens. Environ. 2021, 255, 112285. [Google Scholar] [CrossRef]
  173. van Deventer, H.; Cho, M.A.; Mutanga, O. Multi-season RapidEye imagery improves the classification of wetland and dryland communities in a subtropical coastal region. ISPRS J. Photogramm. Remote Sens. 2019, 157, 171–187. [Google Scholar] [CrossRef]
  174. Ye, X.-C.; Meng, Y.-K.; Xu, L.-G.; Xu, C.-Y. Net primary productivity dynamics and associated hydrological driving factors in the floodplain wetland of China’s largest freshwater lake. Sci. Total Environ. 2019, 659, 302–313. [Google Scholar] [CrossRef] [PubMed]
  175. Dalezios, N.R.; Dercas, N.; Eslamian, S.S. Water scarcity management: Part 2: Satellite-based composite drought analysis. Int. J. Glob. Environ. Issues 2018, 17, 262. [Google Scholar] [CrossRef]
  176. Zhang, M.; Lin, H. Wetland classification using parcel-level ensemble algorithm based on Gaofen-6 multispectral imagery and Sentinel-1 dataset. J. Hydrol. 2022, 606, 127462. [Google Scholar] [CrossRef]
  177. Guo, Y.; Jia, X.; Paull, D.; Benediktsson, J.A. Nomination-favoured opinion pool for optical-SAR-synergistic rice mapping in face of weakened flooding signals. ISPRS J. Photogramm. Remote Sens. 2019, 155, 187–205. [Google Scholar] [CrossRef]
  178. Goldblatt, R.; You, W.; Hanson, G.; Khandelwal, A.K. Detecting the Boundaries of Urban Areas in India: A Dataset for Pixel-Based Image Classification in Google Earth Engine. Remote Sens. 2016, 8, 634. [Google Scholar] [CrossRef] [Green Version]
  179. Huang, C.; Yang, J.; Jiang, P. Assessing Impacts of Urban Form on Landscape Structure of Urban Green Spaces in China Using Landsat Images Based on Google Earth Engine. Remote Sens. 2018, 10, 1569. [Google Scholar] [CrossRef] [Green Version]
  180. Xu, H.; Wei, Y.; Liu, C.; Li, X.; Fang, H. A Scheme for the Long-Term Monitoring of Impervious−Relevant Land Disturbances Using High Frequency Landsat Archives and the Google Earth Engine. Remote Sens. 2019, 11, 1891. [Google Scholar] [CrossRef] [Green Version]
  181. Zhong, Q.; Ma, J.; Zhao, B.; Wang, X.; Zong, J.; Xiao, X. Assessing spatial-temporal dynamics of urban expansion, vegetation greenness and photosynthesis in megacity Shanghai, China during 2000–2016. Remote Sens. Environ. 2019, 233, 111374. [Google Scholar] [CrossRef]
  182. Lin, Y.; Zhang, H.; Lin, H.; Gamba, P.E.; Liu, X. Incorporating synthetic aperture radar and optical images to investigate the annual dynamics of anthropogenic impervious surface at large scale. Remote Sens. Environ. 2020, 242, 111757. [Google Scholar] [CrossRef]
  183. Liu, D.; Chen, N.; Zhang, X.; Wang, C.; Du, W. Annual large-scale urban land mapping based on Landsat time series in Google Earth Engine and OpenStreetMap data: A case study in the middle Yangtze River basin. ISPRS J. Photogramm. Remote Sens. 2019, 159, 337–351. [Google Scholar] [CrossRef]
  184. Mugiraneza, T.; Nascetti, A.; Ban, Y. Continuous Monitoring of Urban Land Cover Change Trajectories with Landsat Time Series and LandTrendr-Google Earth Engine Cloud Computing. Remote Sens. 2020, 12, 2883. [Google Scholar]
  185. Lin, J.; Jin, X.; Ren, J.; Liu, J.; Liang, X.; Zhou, Y. Rapid Mapping of Large-Scale Greenhouse Based on Integrated Learning Algorithm and Google Earth Engine. Remote Sens. 2021, 13, 1245. [Google Scholar] [CrossRef]
  186. Carneiro, E.; Lopes, W.; Espindola, G. Urban Land Mapping Based on Remote Sensing Time Series in the Google Earth Engine Platform: A Case Study of the Teresina-Timon Conurbation Area in Brazil. Remote Sens. 2021, 13, 1338. [Google Scholar] [CrossRef]
  187. Zhang, Z.; Wei, M.; Pu, D.; He, G.; Wang, G.; Long, T. Assessment of Annual Composite Images Obtained by Google Earth Engine for Urban Areas Mapping Using Random Forest. Remote Sens. 2021, 13, 748. [Google Scholar] [CrossRef]
  188. Samat, A.; Gamba, P.; Wang, W.; Luo, J.; Li, E.; Liu, S.; Du, P.; Abuduwaili, J. Mapping Blue and Red Color-Coated Steel Sheet Roof Buildings over China Using Sentinel-2A/B MSIL2A Images. Remote Sens. 2022, 14, 230. [Google Scholar] [CrossRef]
  189. Parks, S.A.; Holsinger, L.M.; Koontz, M.J.; Collins, L.; Whitman, E.; Parisien, M.-A.; Loehman, R.A.; Barnes, J.L.; Bourdon, J.-F.; Boucher, J.; et al. Giving Ecological Meaning to Satellite-Derived Fire Severity Metrics across North American Forests. Remote Sens. 2019, 11, 1735. [Google Scholar] [CrossRef] [Green Version]
  190. Quintero, N.; Viedma, O.; Urbieta, I.R.; Moreno, J.M. Assessing Landscape Fire Hazard by Multitemporal Automatic Classification of Landsat Time Series Using the Google Earth Engine in West-Central Spain. Forests 2019, 10, 518. [Google Scholar] [CrossRef] [Green Version]
  191. Long, T.; Zhang, Z.; He, G.; Jiao, W.; Tang, C.; Wu, B.; Zhang, X.; Wang, G.; Yin, R. 30 m Resolution Global Annual Burned Area Mapping Based on Landsat Images and Google Earth Engine. Remote Sens. 2019, 11, 489. [Google Scholar] [CrossRef] [Green Version]
  192. Bar, S.; Parida, B.R.; Pandey, A.C. Landsat-8 and Sentinel-2 based Forest fire burn area mapping using machine learning algorithms on GEE cloud platform over Uttarakhand, Western Himalaya. Remote Sens. Appl. Soc. Environ. 2020, 18, 100324. [Google Scholar] [CrossRef]
  193. Sulova, A.; Arsanjani, J.J. Exploratory Analysis of Driving Force of Wildfires in Australia: An Application of Machine Learning within Google Earth Engine. Remote Sens. 2021, 13, 10. [Google Scholar] [CrossRef]
  194. Zhang, Z.; He, G.; Long, T.; Tang, C.; Wei, M.; Wang, W.; Wang, G. Spatial Pattern Analysis of Global Burned Area in 2005 Based on Landsat Satellite Images. IOP Conf. Ser. Earth Environ. Sci. 2020, 428, 012078. [Google Scholar] [CrossRef]
  195. Seydi, S.; Akhoondzadeh, M.; Amani, M.; Mahdavi, S. Wildfire Damage Assessment over Australia Using Sentinel-2 Imagery and MODIS Land Cover Product within the Google Earth Engine Cloud Platform. Remote Sens. 2021, 13, 220. [Google Scholar] [CrossRef]
  196. Arruda, V.L.; Piontekowski, V.J.; Alencar, A.; Pereira, R.S.; Matricardi, E.A. An alternative approach for mapping burn scars using Landsat imagery, Google Earth Engine, and Deep Learning in the Brazilian Savanna. Remote Sens. Appl. Soc. Environ. 2021, 22, 100472. [Google Scholar] [CrossRef]
  197. Waller, E.K.; Villarreal, M.L.; Poitras, T.B.; Nauman, T.W.; Duniway, M.C. Landsat time series analysis of fractional plant cover changes on abandoned energy development sites. Int. J. Appl. Earth Obs. Geoinf. ITC J. 2018, 73, 407–419. [Google Scholar] [CrossRef]
  198. Lobo, F.D.L.; Souza-Filho, P.W.M.; Novo, E.M.L.D.M.; Carlos, F.M.; Barbosa, C.C.F. Mapping Mining Areas in the Brazilian Amazon Using MSI/Sentinel-2 Imagery (2017). Remote Sens. 2018, 10, 1178. [Google Scholar] [CrossRef] [Green Version]
  199. Xiao, W.; Deng, X.; He, T.; Chen, W. Mapping Annual Land Disturbance and Reclamation in a Surface Coal Mining Region Using Google Earth Engine and the LandTrendr Algorithm: A Case Study of the Shengli Coalfield in Inner Mongolia, China. Remote Sens. 2020, 12, 1612. [Google Scholar] [CrossRef]
  200. Balaniuk, R.; Isupova, O.; Reece, S. Mining and Tailings Dam Detection in Satellite Imagery Using Deep Learning. Sensors 2020, 20, 6936. [Google Scholar] [CrossRef]
  201. Fuentes, M.; Millard, K.; Laurin, E. Big geospatial data analysis for Canada’s Air Pollutant Emissions Inventory (APEI): Using google earth engine to estimate particulate matter from exposed mine disturbance areas. GISci. Remote Sens. 2019, 57, 245–257. [Google Scholar] [CrossRef]
  202. He, T.; Xiao, W.; Zhao, Y.; Deng, X.; Hu, Z. Identification of waterlogging in Eastern China induced by mining subsidence: A case study of Google Earth Engine time-series analysis applied to the Huainan coal field. Remote Sens. Environ. 2020, 242, 111742. [Google Scholar] [CrossRef]
  203. Zhou, L.; Luo, T.; Du, M.; Chen, Q.; Liu, Y.; Zhu, Y.; He, C.; Wang, S.; Yang, K. Machine Learning Comparison and Parameter Setting Methods for the Detection of Dump Sites for Construction and Demolition Waste Using the Google Earth Engine. Remote Sens. 2021, 13, 787. [Google Scholar] [CrossRef]
  204. Chrysoulakis, N.; Mitraka, Z.; Gorelick, N. Exploiting satellite observations for global surface albedo trends monitoring. Arch. Meteorol. Geophys. Bioclimatol. Ser. B 2018, 137, 1171–1179. [Google Scholar] [CrossRef]
  205. Chastain, R.; Housman, I.; Goldstein, J.; Finco, M.; Tenneson, K. Empirical Cross Sensor Comparison of Sentinel-2A and 2B MSI, Landsat-8 OLI, and Landsat-7 ETM Top of Atmosphere Spectral Characteristics over the Conterminous United States. Remote Sens. Environ. 2019, 221, 274–285. [Google Scholar] [CrossRef]
  206. Demuzere, M.; Bechtel, B.; Mills, G. Global transferability of local climate zone models. Urban Clim. 2018, 27, 46–63. [Google Scholar] [CrossRef]
  207. Ranagalage, M.; Murayama, Y.; Dissanayake, D.; Simwanda, M. The Impacts of Landscape Changes on Annual Mean Land Surface Temperature in the Tropical Mountain City of Sri Lanka: A Case Study of Nuwara Eliya (1996–2017). Sustainability 2019, 11, 5517. [Google Scholar] [CrossRef] [Green Version]
  208. Medina-Lopez, E.; Ureña-Fuentes, L. High-Resolution Sea Surface Temperature and Salinity in the Global Ocean from Raw Satellite Data. Remote Sens. 2019, 11, 2191. [Google Scholar] [CrossRef] [Green Version]
  209. Besnard, S.; Carvalhais, N.; Arain, M.A.; Black, A.; Brede, B.; Buchmann, N.; Chen, J.; Clevers, J.; Dutrieux, L.P.; Gans, F.; et al. Memory effects of climate and vegetation affecting net ecosystem CO2 fluxes in global forests. PLoS ONE 2019, 14, e0211510. [Google Scholar] [CrossRef] [Green Version]
  210. Elnashar, A.; Zeng, H.; Wu, B.; Zhang, N.; Tian, F.; Zhang, M.; Zhu, W.; Yan, N.; Chen, Z.; Sun, Z.; et al. Downscaling TRMM Monthly Precipitation Using Google Earth Engine and Google Cloud Computing. Remote Sens. 2020, 12, 3860. [Google Scholar] [CrossRef]
  211. Yu, B.; Chen, F.; Muhammad, S. Analysis of satellite-derived landslide at Central Nepal from 2011 to 2016. Environ. Earth Sci. 2018, 77, 331. [Google Scholar] [CrossRef]
  212. Cho, E.; Jacobs, J.M.; Jia, X.; Kraatz, S. Identifying Subsurface Drainage using Satellite Big Data and Machine Learning via Google Earth Engine. Water Resour. Res. 2019, 55, 8028–8045. [Google Scholar] [CrossRef]
  213. Uddin; Uddin; Matin; Meyer Operational Flood Mapping Using Multi-Temporal Sentinel-1 SAR Images: A Case Study from Bangladesh. Remote Sens. 2019, 11, 1581. [CrossRef] [Green Version]
  214. Vanama, V.S.K.; Mandal, D.; Rao, Y.S. GEE4FLOOD: Rapid mapping of flood areas using temporal Sentinel-1 SAR images with Google Earth Engine cloud platform. J. Appl. Remote Sens. 2020, 14, 034505. [Google Scholar] [CrossRef]
  215. Ghaffarian, S.; Rezaie Farhadabad, A.; Kerle, N. Post-Disaster Recovery Monitoring with Google Earth Engine. Appl. Sci. 2020, 10, 4574. [Google Scholar] [CrossRef]
  216. Kakooei, M.; Baleghi, Y. A two-level fusion for building irregularity detection in post-disaster VHR oblique images. Earth Sci. Inform. 2020, 13, 459–477. [Google Scholar] [CrossRef]
  217. Padarian, J.; Minasny, B.; McBratney, A. Using Google’s cloud-based platform for digital soil mapping. Comput. Geosci. 2015, 83, 80–88. [Google Scholar] [CrossRef]
  218. Ivushkin, K.; Bartholomeus, H.; Bregt, A.K.; Pulatov, A.; Kempen, B.; de Sousa, L. Global mapping of soil salinity change. Remote Sens. Environ. 2019, 231, 111260. [Google Scholar] [CrossRef]
  219. Poppiel, R.R.; Lacerda, M.P.C.; Safanelli, J.L.; Rizzo, R.; Oliveira, M.P., Jr.; Novais, J.J.; Demattê, J.A.M. Mapping at 30 m Resolution of Soil Attributes at Multiple Depths in Midwest Brazil. Remote Sens. 2019, 11, 2905. [Google Scholar] [CrossRef] [Green Version]
  220. Cao, B.; Domke, G.M.; Russell, M.B.; Walters, B.F. Spatial modeling of litter and soil carbon stocks on forest land in the conterminous United States. Sci. Total Environ. 2018, 654, 94–106. [Google Scholar] [CrossRef]
  221. Greifeneder, F.; Notarnicola, C.; Wagner, W. A Machine Learning-Based Approach for Surface Soil Moisture Estimations with Google Earth Engine. Remote Sens. 2021, 13, 2099. [Google Scholar] [CrossRef]
  222. Zhang, M.; Zhang, M.; Yang, H.; Jin, Y.; Zhang, X.; Liu, H. Mapping Regional Soil Organic Matter Based on Sentinel-2A and MODIS Imagery Using Machine Learning Algorithms and Google Earth Engine. Remote Sens. 2021, 13, 2934. [Google Scholar] [CrossRef]
  223. Gómez-Chova, L.; Amorós-López, J.; Mateo-García, G.; Muñoz-Marí, J.; Camps-Valls, G. Cloud masking and removal in remote sensing image time series. J. Appl. Remote Sens. 2017, 11, 015005. [Google Scholar] [CrossRef]
  224. Mateo-García, G.; Gómez-Chova, L.; Amorós-López, J.; Muñoz-Marí, J.; Camps-Valls, G. Multitemporal Cloud Masking in the Google Earth Engine. Remote Sens. 2018, 10, 1079. [Google Scholar] [CrossRef] [Green Version]
  225. Yin, Z.; Ling, F.; Foody, G.M.; Li, X.; Du, Y. Cloud detection in Landsat-8 imagery in Google Earth Engine based on a deep convolutional neural network. Remote Sens. Lett. 2020, 11, 1181–1190. [Google Scholar] [CrossRef]
  226. Li, J.; Wang, L.; Liu, S.; Peng, B.; Ye, H. An automatic cloud detection model for Sentinel-2 imagery based on Google Earth Engine. Remote Sens. Lett. 2021, 13, 196–206. [Google Scholar] [CrossRef]
  227. Zhang, X.; Qiu, Z.; Peng, C.; Ye, P. Removing cloud cover interference from Sentinel-2 imagery in Google Earth Engine by fusing Sentinel-1 SAR data with a CNN model. Int. J. Remote Sens. 2021, 43, 132–147. [Google Scholar] [CrossRef]
  228. Meraner, A.; Ebel, P.; Zhu, X.X.; Schmitt, M. Cloud removal in Sentinel-2 imagery using a deep residual neural network and SAR-optical data fusion. ISPRS J. Photogramm. Remote Sens. 2020, 166, 333–346. [Google Scholar] [CrossRef]
  229. Carrasco-Escobar, G.; Manrique, E.; Ruiz-Cabrejos, J.; Saavedra, M.; Alava, F.; Bickersmith, S.; Prussing, C.; Vinetz, J.M.; Conn, J.; Moreno, M.; et al. High-accuracy detection of malaria vector larval habitats using drone-based multispectral imagery. PLoS Negl. Trop. Dis. 2019, 13, e0007105. [Google Scholar] [CrossRef] [Green Version]
  230. Ascensão, F.; Yogui, D.R.; Alves, M.; Medici, E.P.; Desbiez, A. Predicting spatiotemporal patterns of road mortality for medium-large mammals. J. Environ. Manag. 2019, 248, 109320. [Google Scholar] [CrossRef]
  231. Lyons, M.B.; Brandis, K.J.; Murray, N.J.; Wilshire, J.H.; McCann, J.A.; Kingsford, R.T.; Callaghan, C.T. Monitoring large and complex wildlife aggregations with drones. Methods Ecol. Evol. 2019, 10, 1024–1035. [Google Scholar] [CrossRef] [Green Version]
  232. Pérez-Romero, J.; Navarro-Cerrillo, R.M.; Palacios-Rodriguez, G.; Acosta, C.; Mesas-Carrascosa, F.J. Improvement of Remote Sensing-Based Assessment of Defoliation of Pinus spp. Caused by Thaumetopoea pityocampa Denis and Schiffermüller and Related Environmental Drivers in Southeastern Spain. Remote Sens. 2019, 11, 1736. [Google Scholar]
  233. Liss, B.; Howland, M.D.; Levy, T.E. Testing Google Earth Engine for the automatic identification and vectorization of archaeological features: A case study from Faynan, Jordan. J. Archaeol. Sci. Rep. 2017, 15, 299–304. [Google Scholar] [CrossRef] [Green Version]
  234. Orengo, H.; Garcia-Molsosa, A. A brave new world for archaeological survey: Automated machine learning-based potsherd detection using high-resolution drone imagery. J. Archaeol. Sci. 2019, 112, 105013. [Google Scholar] [CrossRef]
  235. Orengo, H.A.; Conesa, F.C.; Garcia-Molsosa, A.; Lobo, A.; Green, A.S.; Madella, M.; Petrie, C.A. Automated detection of archaeological mounds using machine-learning classification of multisensor and multitemporal satellite data. Proc. Natl. Acad. Sci. USA 2020, 117, 18240–18250. [Google Scholar] [CrossRef] [PubMed]
  236. Hagenaars, G.; de Vries, S.; Luijendijk, A.P.; de Boer, W.P.; Reniers, A.J. On the accuracy of automated shoreline detection derived from satellite imagery: A case study of the sand motor mega-scale nourishment. Coast. Eng. 2018, 133, 113–125. [Google Scholar] [CrossRef]
  237. Vos, K.; Harley, M.D.; Splinter, K.D.; Simmons, J.A.; Turner, I.L. Sub-annual to multi-decadal shoreline variability from publicly available satellite imagery. Coast. Eng. 2019, 150, 160–174. [Google Scholar] [CrossRef]
  238. Cao, W.; Zhou, Y.; Li, R.; Li, X. Mapping changes in coastlines and tidal flats in developing islands using the full time series of Landsat images. Remote Sens. Environ. 2020, 239, 111665. [Google Scholar] [CrossRef]
  239. Traganos, D.; Poursanidis, D.; Aggarwal, B.; Chrysoulakis, N.; Reinartz, P. Estimating Satellite-Derived Bathymetry (SDB) with the Google Earth Engine and Sentinel-2. Remote Sens. 2018, 10, 859. [Google Scholar] [CrossRef] [Green Version]
  240. Sagawa, T.; Yamashita, Y.; Okumura, T.; Yamanokuchi, T. Satellite Derived Bathymetry Using Machine Learning and Multi-Temporal Satellite Images. Remote Sens. 2019, 11, 1155. [Google Scholar] [CrossRef] [Green Version]
  241. Tedesche, M.E.; Trochim, E.D.; Fassnacht, S.R.; Wolken, G.J. Extent Changes in the Perennial Snowfields of Gates of the Arctic National Park and Preserve, Alaska. Hydrology 2019, 6, 53. [Google Scholar] [CrossRef] [Green Version]
  242. Qi, M.; Liu, S.; Yao, X.; Xie, F.; Gao, Y. Monitoring the Ice Phenology of Qinghai Lake from 1980 to 2018 Using Multisource Remote Sensing Data and Google Earth Engine. Remote Sens. 2020, 12, 2217. [Google Scholar] [CrossRef]
  243. Yang, L.; Cervone, G. Analysis of remote sensing imagery for disaster assessment using deep learning: A case study of flooding event. Soft Comput. 2019, 23, 13393–13408. [Google Scholar] [CrossRef]
  244. Davies, D.K.; Murphy, K.J.; Michael, K.; Becker-Reshef, I.; Justice, C.O.; Boller, R.; Braun, S.A.; Schmaltz, J.E.; Wong, M.M.; Pasch, A.N.; et al. The Use of NASA LANCE Imagery and Data for Near Real-Time Applications. In Time-Sensitive Remote Sensing; Lippitt, C.D., Stow, D.A., Coulter, L.L., Eds.; Springer: New York, NY, USA, 2015; pp. 165–182. ISBN 9781493926022. [Google Scholar]
  245. Lippitt, C.D.; Stow, D.A.; Riggan, P.J. Application of the remote-sensing communication model to a time-sensitive wildfire remote-sensing system. Int. J. Remote Sens. 2016, 37, 3272–3292. [Google Scholar] [CrossRef]
  246. Hoffmann, J.; Borgeaud, S.; Mensch, A.; Buchatskaya, E.; Cai, T.; Rutherford, E.; de Las Casas, D.; Hendricks, L.A.; Welbl, J.; Clark, A.; et al. Training Compute-Optimal Large Language Models. arXiv 2022, arXiv:2203.15556. [Google Scholar]
  247. Banko, M.; Brill, E. Scaling to Very Very Large Corpora for Natural Language Disambiguation. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics. Toulouse, France, 6–11 July 2001; pp. 26–33. [Google Scholar]
  248. Gil Press Andrew Ng Launches A Campaign for Data-Centric AI. Available online: https://www.forbes.com/sites/gilpress/2021/06/16/andrew-ng-launches-a-campaign-for-data-centric-ai/ (accessed on 25 April 2022).
  249. Pratt, L.Y. Discriminability-Based Transfer between Neural Networks. In Advances in Neural Information Processing Systems 5; Hanson, S.J., Cowan, J.D., Giles, C.L., Eds.; Morgan-Kaufmann: Burlington, MA, USA, 1993; pp. 204–211. [Google Scholar]
  250. Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
  251. Weiss, K.; Khoshgoftaar, T.M.; Wang, D.D. A survey of transfer learning. J. Big Data 2016, 3, 1345–1459. [Google Scholar] [CrossRef] [Green Version]
  252. Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A Survey on Deep Transfer Learning. In Proceedings of the International Conference on Artificial Neural Networks; Springer: Cham, Switzerland, 2018; pp. 270–279. [Google Scholar]
  253. Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE 2021, 109, 43–76. [Google Scholar] [CrossRef]
  254. Li, C.; Zhang, S.; Qin, Y.; Estupinan, E. A systematic review of deep transfer learning for machinery fault diagnosis. Neurocomputing 2020, 407, 121–135. [Google Scholar] [CrossRef]
  255. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  256. Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
  257. Bar, Y.; Diamant, I.; Wolf, L.; Lieberman, S.; Konen, E.; Greenspan, H. Chest Pathology Detection Using Deep Learning with Non-Medical Training. In Proceedings of the 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), New York, NY, USA, 16–19 April 2015; pp. 294–297. [Google Scholar]
  258. Maaten, L.; Chen, M.; Tyree, S.; Weinberger, K. Learning with Marginalized Corrupted Features. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 410–418. [Google Scholar]
  259. Gillies, M.; Fiebrink, R.; Tanaka, A.; Garcia, J.; Bevilacqua, F.; Heloir, A.; Nunnari, F.; Mackay, W.; Amershi, S.; Lee, B.; et al. Human-Centred Machine Learning. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems 2016, San Jose, CA, USA, 7–12 May 2016. [Google Scholar]
  260. Wu, Q. geemap: A Python package for interactive mapping with Google Earth Engine. J. Open Source Softw. 2020, 5, 2305. [Google Scholar] [CrossRef]
  261. Aybar, C.; Wu, Q.; Bautista, L.; Yali, R.; Barja, A. rgee: An R package for interacting with Google Earth Engine. J. Open Source Softw. 2020, 5, 2272. [Google Scholar] [CrossRef]
  262. Huntington, J.L.; Hegewisch, K.C.; Daudert, B.; Morton, C.G.; Abatzoglou, J.T.; McEvoy, D.J.; Erickson, T. Climate Engine: Cloud Computing and Visualization of Climate and Remote Sensing Data for Advanced Natural Resource Monitoring and Process Understanding. Bull. Am. Meteorol. Soc. 2017, 98, 2397–2410. [Google Scholar] [CrossRef]
  263. Li, H.; Wan, W.; Fang, Y.; Zhu, S.; Chen, X.; Liu, B.; Hong, Y. A Google Earth Engine-enabled software for efficiently generating high-quality user-ready Landsat mosaic images. Environ. Model. Softw. 2018, 112, 16–22. [Google Scholar] [CrossRef]
  264. Yang, L.; Driscol, J.; Sarigai, S.; Wu, Q.; Lippitt, C.D.; Morgan, M. Towards Synoptic Water Monitoring Systems: A Review of AI Methods for Automating Water Body Detection and Water Quality Monitoring Using Remote Sensing. Sensors 2022, 22, 2416. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Geospatial distribution and overview statistics of the reviewed 200 papers. (a) Spatial distribution of reviewed papers based on the first author’s institution location, (b) number of published papers by year from 2015 to early 2022, (c) journals the review papers are published in, and (d) country distribution. Note that a freely accessible, interactive version of the map and all charts in this paper can be accessed via our web app tool (the web app tool URL and its brief demo video are provided in Appendix A).
Figure 1. Geospatial distribution and overview statistics of the reviewed 200 papers. (a) Spatial distribution of reviewed papers based on the first author’s institution location, (b) number of published papers by year from 2015 to early 2022, (c) journals the review papers are published in, and (d) country distribution. Note that a freely accessible, interactive version of the map and all charts in this paper can be accessed via our web app tool (the web app tool URL and its brief demo video are provided in Appendix A).
Remotesensing 14 03253 g001
Figure 2. Word-cloud visualization of all the reviewed 200 papers that leverage GEE and AI.
Figure 2. Word-cloud visualization of all the reviewed 200 papers that leverage GEE and AI.
Remotesensing 14 03253 g002
Figure 3. Overview statistics of methods used in the reviewed 200 papers. (a) method or application oriented, (b) methods at macro level (CV, ML or DL), (c) methods at detailed level (classification, regression, or segmentation).
Figure 3. Overview statistics of methods used in the reviewed 200 papers. (a) method or application oriented, (b) methods at macro level (CV, ML or DL), (c) methods at detailed level (classification, regression, or segmentation).
Remotesensing 14 03253 g003
Figure 4. Statistics of the reviewed papers in terms of application focus (a), study area (b), and RS data type used (c).
Figure 4. Statistics of the reviewed papers in terms of application focus (a), study area (b), and RS data type used (c).
Remotesensing 14 03253 g004
Figure 5. Statistics of models compared, and evaluation metrics used in the reviewed 200 papers. (a) Models used and compared in the reviewed studies and (b) evaluation metrics used.
Figure 5. Statistics of models compared, and evaluation metrics used in the reviewed 200 papers. (a) Models used and compared in the reviewed studies and (b) evaluation metrics used.
Remotesensing 14 03253 g005
Figure 6. Word-cloud visualization of all the reviewed papers targeting crop mapping (i.e., those 37 papers summarized in Table 1).
Figure 6. Word-cloud visualization of all the reviewed papers targeting crop mapping (i.e., those 37 papers summarized in Table 1).
Remotesensing 14 03253 g006
Figure 7. Word-cloud visualization of all the reviewed papers targeting LULC application (i.e., those 27 papers summarized in Table 2).
Figure 7. Word-cloud visualization of all the reviewed papers targeting LULC application (i.e., those 27 papers summarized in Table 2).
Remotesensing 14 03253 g007
Figure 8. Word-cloud visualization of all the reviewed papers targeting forest and deforestation monitoring (i.e., those 20 papers summarized in Table 3).
Figure 8. Word-cloud visualization of all the reviewed papers targeting forest and deforestation monitoring (i.e., those 20 papers summarized in Table 3).
Remotesensing 14 03253 g008
Figure 9. Word cloud visualization of all the reviewed papers targeting vegetation mapping (i.e., those 18 papers summarized in Table 4).
Figure 9. Word cloud visualization of all the reviewed papers targeting vegetation mapping (i.e., those 18 papers summarized in Table 4).
Remotesensing 14 03253 g009
Figure 10. Word-cloud visualization of all the reviewed papers targeting water mapping and water quality monitoring (i.e., those 18 papers summarized in Table 5).
Figure 10. Word-cloud visualization of all the reviewed papers targeting water mapping and water quality monitoring (i.e., those 18 papers summarized in Table 5).
Remotesensing 14 03253 g010
Figure 11. Word-cloud visualization of all the reviewed papers targeting wetland mapping (i.e., those 16 papers summarized in Table 6).
Figure 11. Word-cloud visualization of all the reviewed papers targeting wetland mapping (i.e., those 16 papers summarized in Table 6).
Remotesensing 14 03253 g011
Figure 12. Word-cloud visualization of all the reviewed papers targeting infrastructure and building detection, urbanization monitoring (i.e., those 11 papers summarized in Table 7).
Figure 12. Word-cloud visualization of all the reviewed papers targeting infrastructure and building detection, urbanization monitoring (i.e., those 11 papers summarized in Table 7).
Remotesensing 14 03253 g012
Figure 13. Word-cloud visualization of all the reviewed papers targeting wildfires and burned areas (i.e., those eight papers summarized in Table 8).
Figure 13. Word-cloud visualization of all the reviewed papers targeting wildfires and burned areas (i.e., those eight papers summarized in Table 8).
Remotesensing 14 03253 g013
Figure 14. Word-cloud visualization of all the reviewed papers targeting heavy industry and pollution monitoring (i.e., those seven papers summarized in Table 9).
Figure 14. Word-cloud visualization of all the reviewed papers targeting heavy industry and pollution monitoring (i.e., those seven papers summarized in Table 9).
Remotesensing 14 03253 g014
Figure 15. Word-cloud visualization of all the reviewed papers targeting climate and meteorology (i.e., those seven papers summarized in Table 10).
Figure 15. Word-cloud visualization of all the reviewed papers targeting climate and meteorology (i.e., those seven papers summarized in Table 10).
Remotesensing 14 03253 g015
Figure 16. Word-cloud visualization of all the reviewed papers targeting disaster management (i.e., those six papers summarized in Table 11).
Figure 16. Word-cloud visualization of all the reviewed papers targeting disaster management (i.e., those six papers summarized in Table 11).
Remotesensing 14 03253 g016
Figure 17. Word-cloud visualization of all the reviewed papers targeting soil (i.e., those six papers summarized in Table 12).
Figure 17. Word-cloud visualization of all the reviewed papers targeting soil (i.e., those six papers summarized in Table 12).
Remotesensing 14 03253 g017
Figure 18. Word-cloud visualization of all the reviewed papers targeting cloud detection and masking (i.e., those five papers summarized in Table 13).
Figure 18. Word-cloud visualization of all the reviewed papers targeting cloud detection and masking (i.e., those five papers summarized in Table 13).
Remotesensing 14 03253 g018
Figure 19. Word-cloud visualization of all the reviewed papers targeting wildlife and animal studies (i.e., those four papers summarized in Table 14).
Figure 19. Word-cloud visualization of all the reviewed papers targeting wildlife and animal studies (i.e., those four papers summarized in Table 14).
Remotesensing 14 03253 g019
Figure 20. Word-cloud visualization of all the reviewed papers targeting archaeology (i.e., those three papers summarized in Table 15).
Figure 20. Word-cloud visualization of all the reviewed papers targeting archaeology (i.e., those three papers summarized in Table 15).
Remotesensing 14 03253 g020
Figure 21. Word-cloud visualization of all the reviewed papers targeting coastline monitoring (i.e., those three papers summarized in Table 16).
Figure 21. Word-cloud visualization of all the reviewed papers targeting coastline monitoring (i.e., those three papers summarized in Table 16).
Remotesensing 14 03253 g021
Figure 22. Word-cloud visualization of all the reviewed papers targeting bathymetric mapping (i.e., those papers summarized in Table 17).
Figure 22. Word-cloud visualization of all the reviewed papers targeting bathymetric mapping (i.e., those papers summarized in Table 17).
Remotesensing 14 03253 g022
Figure 23. Word-cloud visualization of all the reviewed papers targeting ice and snow (i.e., those two summarized in Table 18).
Figure 23. Word-cloud visualization of all the reviewed papers targeting ice and snow (i.e., those two summarized in Table 18).
Remotesensing 14 03253 g023
Figure 24. Word-cloud visualization of reviewed 21 novel methods papers (all those 21 papers from Table 19, Table 20 and Table 21).
Figure 24. Word-cloud visualization of reviewed 21 novel methods papers (all those 21 papers from Table 19, Table 20 and Table 21).
Remotesensing 14 03253 g024
Figure 25. Statistics related to studies being computed in the cloud or computed offline on local computers in the reviewed 200 papers. (a) Computed online on cloud platforms, (b) computed offline on local machines. NA refers to “not applicable”, indicating a publication’s code ran solely on cloud platform(s) and NS means “not specified”.
Figure 25. Statistics related to studies being computed in the cloud or computed offline on local computers in the reviewed 200 papers. (a) Computed online on cloud platforms, (b) computed offline on local machines. NA refers to “not applicable”, indicating a publication’s code ran solely on cloud platform(s) and NS means “not specified”.
Remotesensing 14 03253 g025
Figure 26. Statistics related to what software and/or programming languages were used in the studies in the reviewed 200 papers. NA refers to “not applicable”, meaning that those papers used only GEE to complete their analysis.
Figure 26. Statistics related to what software and/or programming languages were used in the studies in the reviewed 200 papers. NA refers to “not applicable”, meaning that those papers used only GEE to complete their analysis.
Remotesensing 14 03253 g026
Table 1. Studies targeting crop mapping from RS imagery using AI (Note that references marked * denotes novel methods and will be detailed in Section 3.3).
Table 1. Studies targeting crop mapping from RS imagery using AI (Note that references marked * denotes novel methods and will be detailed in Section 3.3).
ReferencesMethodModel ComparisonRS Data TypeStudy Area
Lobell et al. (2015) [45]regressionmultiple linear regressionCropland Data Layer, Landsat 5, Landsat 7United States
Shelestov et al. (2017) [46]classificationCART, ensemble NN, IKPamir, MLP, NB, RF, SVMLandsat 8 OLI, Landsat 8 TOAUkraine
Xiong et al. (2017) [47]classification, segmentationRF, RHSeg, SVMLandsat 8 OLI TOA, Sentinel-2 MSI TOA, SRTM DEMAfrica (continent)
Xiong et al. (2017) [48]classificationDT, k-meansAfricover, CUI, FROMGC, GCEV1, GLC 2000, Global30, Globcover, GRIPC, IKONOS, LULC 2000, MCD12Q1, MYD13, QuickBird, WorldView 2Africa (continent)
Deines et al. (2017) [49]classification, regressionCART, RFAIM-RRB, Cropland Data Layer, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLI, NLCD, USGS MIrAD-USUnited States
Teluguntla et al. (2018) [16]classificationRFGoogle Earth Pro, Landsat 7 TM, Landsat 8 OLI, National Geospatial AgencyAustralia, China
Kelley et al. (2018) [50]classificationRFLandsat 8 TOA, SRTM DEMNicaragua
Ragettli et al. (2018) [51]classificationk-means, region growing clustering algorithm, RFLandsat 7, Landsat 8, MOD09A1, MOD09Q1, MYD09A1, MYD09Q1, Sentinel 2, SRTM DEMKazakhstan, Kyrgyzstan
Ghazaryan et al. (2018) [52]classificationdecision fusion, RF, SVMLandsat 8 OLI, Sentinel-1Ukraine
Mandal et al. (2018) [53]classificationk-meansSentinel-1India
Oliphant et al. (2019) [54]classificationRFGeoEye, Landsat 7 ETM+, Landsat 8 OLI, National Geospatial Agency, Quickbird, SRTM DEM, WorldViewBrunei, Cambodia, Indonesia, Japan, Laos, Malaysia, Myanmar, North Korean, Philippines, South Korean, Thailand, Vietnam
Sun et al. (2019) [55]regressionCNN, CNN-LSTM hybrid, LSTMCropland Data Layer, MOD11A2, MOD09A1United States
Wang et al. (2019) [56]classificationANN, CART, RF, SVMSentinel 2China
Tian et al. (2019) [57]classificationRFSentinel-1, Sentinel-2 MSI, SRTM DEMChina
Xie et al. (2019) [58]classificationRFCropland Data Layer, GIAM, GMIA, LANID, Landsat 5, Landsat 7, Landsat 8, MIrAD-US, MOD11A2, MOD13Q1, NAIP, NLCDUnited States
Jin et al. (2019) [59]classification, regressionlinear regression, RFSentinel-1, Sentinel-2 MSI, MYD11A2Kenya, Tanzania
Rudiyanto et al. (2019) [60]classificationANN, k-means, RF, SVMGoogle Earth, Sentinel-1Indonesia, Malaysia
Wang et al. (2019) [61]classificationGMM, k-means, RFCropland Data Layer, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLIUnited States
Liang et al. (2019) [62]classificationCARTCropland Data Layer, Landsat 5 TM, Landsat 8 OLIUnited States
Tian et al. (2019) [63]classificationDTLandsat 7, Landsat 8, MOD13Q1, Sentinel-2, SRTM DEMChina
Neetu et al. (2019) [64]classificationCART, RF, SVMSentinel-2 MSIIndia
Gumma et al. (2020) [65]classificationRFGeoEye, GDEM, IKONOS, Landsat 7, Landsat 8, Quickbird, WorldViewBangladesh, Bhutan, India, Nepal, Pakistan, Sri Lanka
Han et al. (2020) [66]regressionBGT, BST, DT, GPR, KNN, NN, RF, SVMMODIS TerraChina
Phalke et al. (2020) [67]classificationCART, RF, SVMGoogle Earth, Landsat 7 ETM+, Landsat 8 OLI, National Geospatial Agency, SRTM DEMAsia, Europe, Middle East (multiple countries)
Samasse et al. (2020) [19]classificationRFLandsat 8 OLI, Rapid Land Cover MapperBurkina Faso, Mali, Mauritania, Niger, Senegal
Chen et al. (2020) [68]classificationRFJRC Global Surface Water, Sentinel-1, Sentinel-2 MSIChina
Amani et al. (2020) [69] *classification, segmentationANN, SNICCanada’s Annual Crop Inventory, MCD12Q1, Sentinel-1, Sentinel-2Canada
You and Dong (2020) [70]classificationRFGoogle Earth, Sentinel-1, Sentinel-2 MSIChina
Poortinga et al. (2021) [71]segmentationU-NetNICFI PlanetThailand
Adrian et al. (2021) [72] *classification, segmentationDnCNN, RF, SegNet, U-Net, 3D U-NetSentinel-1, Sentinel-2, WorldView 3United States
Cao et al. (2021) [73]regressionCNN, DNN, LSTM, RFMOD13A2, SRTM DEMChina
Luo et al. (2021) [74]classificationRF, SNICSentinel-1China
Ni et al. (2021) [75]classificationSVMSentinel-2China
Sun et al. (2022) [76]regressionRFFROM-GLC10, MOD15A3, Sentinel-2China
Li et al. (2022) [77]classification, segmentationRF, SNICGoogle Earth, Sentinel-2China
Han et al. (2022) [78]classificationRFLandsat 5 TM, Landsat 8 OLI, Sentinel-2China
Hedayati et al. (2022) [79]classificationfuzzy rules, Maximum LikelihoodALOS PALSAR, Google Earth, Landsat 8, MYD11A2Iran
Table 2. Studies targeting LULC from RS imagery using AI (Note that references marked * denotes novel methods and will be detailed in Section 3.3).
Table 2. Studies targeting LULC from RS imagery using AI (Note that references marked * denotes novel methods and will be detailed in Section 3.3).
ReferencesMethodModel ComparisonRS Data TypeStudy Area
Azzari and Lobell (2017) [80]classificationRFLandsat 8 OLI TOA, Sentinel-2 MSI TOA, SRTM DEMZambia
Midekisa et al. (2017) [81]classificationRFDMSP NTL, Globeland30, Hansen Global Forest Change, Landsat 7 ETM+Africa (continent)
Hu et al. (2018) [82]classificationCARTGlobCover, Landsat 5 TM, Landsat 8 OLIChina
Ge et al. (2019) [83]classificationBayesian hierarchical model, RFALOS DSM, Landsat 8 OLI, VIIRS NTLChina
Lee et al. (2018) [84] *classificationBULC-UGlobCover, Landsat 5Brazil
Zurqani et al. (2018) [85]classificationRFLandsat 5 SR, Landsat 5 TOA, Landsat 8 SR, Landsat 8 TOA, NAIP, NLCD, SRTM DEM, USGS Watershed Boundary DatasetUnited States
Murray et al. (2018) [86] *classificationRFLandsat 7 ETM+, Landsat 8 OLI, Landsat 8 SR, SRTM DEMGlobal
Mardani et al. (2019) [87]classificationBT, SVMFAO land cover, Sentinel-2Lesotho
Gong et al. (2019) [88]classificationRFFROM-GLC, Landsat 8, Sentinel-2, SRTM DEMGlobal
Hao et al. (2019) [89]classificationCARTGLDAS, GlobeLand30, Landsat 8 OLI, MOD11A2, MOD13A2China
Miettinen et al. (2019) [90]classificationDT, maximum likelihood, RFALOS PALSAR-2, Landsat 7 ETM+, Landsat 8 OLI, Sentinel-1, SRTM DEMBrunei, Indonesia, Malaysia, Singapore, Timor-Leste
Xie et al. (2019) [91]classificationRFGlobal Field Photo Library, Google Earth, Landsat 5 TM, Landsat 7 ETM+, MCD12Q1, MCD43A4China
Adepoju and Adelabu (2020) [92]classificationCART, gmoMaxEnt, RF, SVMGoogle Earth, Landsat 8 OLI, SRTM DEMSouth Africa
Ghorbanian et al. (2020) [93]classificationRFGoogle Earth, Sentinel-1, Sentinel-2Iran
Liang et al. (2020) [94] *classificationCART, MD, RFGoogle Earth, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLI, SRTM DEMChina
Zeng et al. (2020) [95]classificationRFGFSAD, GHSL, Hansen Global Forest Change, Landsat 8 OLI, Sentinel-1 GRD, SRTM DEMSouth Africa
Naboureh et al. (2020) [96]classificationRFLandsat 8 OLI, SRTM DEMIran
Naboureh et al. (2020) [97]classificationSVM, RUESVMGoogle Earth, Sentinel-2China, Iran
Li et al. (2020) [98]classificationRFFROM-GLC10, GHSL, Landsat 8 OLI, MYD11A2, Sentinel-2 MSI, SRTM DEM, Suomi-NPP NTLAfrica (continent)
Huang et al. (2020) [99]classificationRFCCI-LC, FROM-GLC, Google Earth, Landsat 5 TMGlobal
Tassi and Vizzari (2020) [100]classification, segmentationRF, SNIC, SVMLandsat 8, PlanetScope, Sentinel-2Italy
Shetty et al. (2021) [101]classificationCART, RF, RVM, SVMBCLL, GlobCover, Google Earth, Landsat 8 OLIIndia
Feizizadeh et al. (2021) [102]classificationCART, RF, SVMaerial photography, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLIIran
Shafizadeh-Moghadam et al. (2021) [103]classification, segmentationRF, SNICGoogle Earth, Landsat 8Iran, Iraq, Kuwait, Saudi Arabia, Syria, Turkey
Pan et al. (2021) [104]classificationCART, RFLandsat 5 TM, MCD12Q1, SRTM DEMAustralia, United States
Becker et al. (2021) [105]classificationRFLandsat 8Brazil
Jin et al. (2022) [106]classification, segmentationRF, SNICALOS PALSAR, CCI-LC, CGLS-LC, FROM-GLC, GFSAD30, GHSL, JRC Global Surface Water, Landsat 7 ETM+, Landsat 8 OLI, MCD12Q1Asia (multiple countries)
Table 3. Studies targeting forest change and deforestation from RS imagery using AI.
Table 3. Studies targeting forest change and deforestation from RS imagery using AI.
ReferencesMethodModel ComparisonRS Data TypeStudy Area
Lee et al. (2016) [107]classificationCART, MD, RFLandsat 8Indonesia
Wang et al. (2019) [15]classificationRFALOS PALSAR, GlobeLand30-2010, Hansen Global Forest Change dataset, JRC Yearly Water Classification History, Landsat 5 TM, Landsat 7 ETM+, RapidEye, TerraClass-2010, USGS Global Tree Cover 2010Brazil
Voight et al. (2019) [108]classificationCART, Markov Chain model, MLPGoogle Earth, Landsat MSS, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLIBelize
Koskinen et al. (2019) [109]classificationCART, RF, SVMALOS PALSAR, Google Earth, Landsat 8 OLI, NAFORMA, Sentinel-1, Sentinel-2 MSI, SRTM DEMTanzania
Duan et al. (2019) [110]classificationRFGoogle Earth, Sentinel-2China
Poortinga et al. (2019) [111]classificationDT, Monte Carlo, RFALOS GDSM, Landsat 8, PlanetScope, RapidEye, Sentinel-1, Sentinel-2Myanmar
Shimizu et al. (2019) [112]classificationRFGoogle Earth, Landsat 8, MCD12Q1, PlanetScope, RapidEye, Sentinel-1Myanmar
Ramdani (2019) [113]classificationGMM, KNN, RF, SVMSentinel-1, SRTM DEMIndonesia
Çolak et al. (2019) [114]classificationSVMCORINE LULC, Sentinel-1, Sentinel-2Turkey
Shaharum et al. (2020) [115]classificationCART, RF, SVMGoogle Earth, Landsat 8, SRTM DEMMalaysia
de Sousa et al. (2020) [116]classificationRFALOS PALSAR, Landsat 8 OLI, SRTM DEMGabon, Liberia
Brovelli et al. (2020) [117]classificationANN, RFCBERS 2B, CBERS 4, Landsat 5, Landsat 7, Landsat 8, Sentinel-2Brazil
Kamal et al. (2020) [118]classificationSVMLandsat 8 OLIIndonesia
Wei et al. (2020) [119]classificationbinomial logistic regressionAW3D30, CHELSA V1.2, GeoEye-1, GMTED2010, Google Earth, Hansen Global Forest Change, Landsat 5, NAIPUnited States
Praticò et al. (2021) [120]classificationCART, k-means, RF, SVMSentinel-2Italy
Xie et al. (2021) [121]classificationRFSentinel-1, Sentinel-2, SRTM DEMChina
Floreano and de Moraes (2021) [122]classificationMarkov-CA, MLP, RFGoogle Earth Pro, Landsat 5 TM, Landsat 7 ETM, Landsat 8 OLIBrazil
Kumar et al. (2022) [123]classificationRFForest Survey of India, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLI, MCD12Q1India
Zhao et al. (2022) [124]classification, segmentationLandTrendr, RF, U-NetGoogle Earth Pro, Hansen Global Forest Change, MTBS, MCD64A1, Planet, Sentinel-1, SRTM DEMBrazil, United States
Wimberly et al. (2022) [125]classification, segmentationLandTrendr, RFGoogle Earth, Landsat 7 ETM+, Landsat 8 OLI, WorldViewGhana
Table 4. Studies targeting wetland mapping from RS imagery using AI (Note that references marked * denotes novel methods and will be detailed in Section 3.3).
Table 4. Studies targeting wetland mapping from RS imagery using AI (Note that references marked * denotes novel methods and will be detailed in Section 3.3).
ReferencesMethodModel ComparisonRS Data TypeStudy Area
Johansen et al. (2015) [126]classificationCART, RF, NDVI, Foliage Projective CoverLandsat 5 TM, Landsat 7 ETM+Australia
Traganos et al. (2018) [127]classificationCART, RF, SVMSentinel-2 LIC TOAGreece
Tsai et al. (2018) [128]classificationDT, RFLandsat 7 TM, Landsat 8 OLIChina
Jansen et al. (2018) [129]regressionmultiple linear regression, polynomial linear regressionLandsat 7 ETM+, Landsat 8 OLI, USGS National Elevation DatasetUnited States
Jones et al. (2018) [130]classificationRFLandsat 5 TM, Landsat 7 ETM+, Landsat 8 OLI, USGS National Elevation DatasetUnited States
Campos-Taberner et al. (2018) [131]regressionRFBELMANIP2, MCD15A3H, MCD43A4Global
Xin and Adler (2019) [132]classificationFCNN, CNN-LSTM hybridSentinel-2 MSIUnited States
Parente et al. (2019) [43]classification, segmentationLSTM, RF, U-NetPlanetScopeBrazil
Parente et al. (2019) [133]classificationRFGoogle Earth, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLI, MOD13Q1Brazil
Zhang et al. (2019) [134]classificationRFGoogle Earth, Landsat 8 OLIChina
Alencar et al. (2020) [135] *classificationDT, RFLandsat 5 TM, Landsat 7 ETM+, Landsat 8 OLIBrazil
Zhou et al. (2020) [21]regressionCART, CNB, MLP, RF, SVMLandsat 8 OLI, MCD43A1, MCD43A4United States
Tian et al. (2020) [136]classificationSAE, SVMGoogle Earth, Landsat 5 TM, Landsat 8 OLI, Pleiades 2, QuickBird, SPOT 4, SPOT 6, UAS, WorldView 1, WorldView 3China
Srinet et al. (2020) [137]classificationRFMOD09A1, SRTM DEM, WorldClim V2 BioclimIndia
Long et al. (2021) [138] *classification, segmentationCART, LandTrendr, MD, NB, RF, SVMCGLS-LC100, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLI, Sentinel-1, Sentinel-2, SRTM DEMChina
Yan et al. (2021) [139]classificationRFGaofen-2, Landsat 4 TM, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLI, Pléiades A, QuickBird, UAS, WorldView 2China
Wu et al. (2021) [140]classificationRFGaofen-2, Landsat 8 OLIChina
Pipia et al. (2021) [141] *regressionGPRHyMap, Sentinel-2Europe (multiple countries)
Table 5. Studies targeting water body detection from RS imagery using AI (Note that references marked * denotes novel methods and will be detailed in Section 3.3).
Table 5. Studies targeting water body detection from RS imagery using AI (Note that references marked * denotes novel methods and will be detailed in Section 3.3).
ReferencesMethodModel ComparisonRS Data TypeStudy Area
Pekel et al. (2016) [32] *classificationexpert systemLandsat 5 TM, Landsat 7 ETM+, Landsat 8 OLIglobal
Zou et al. (2017) [142]regressionmultiple linear regressionGlobal Inland Water, Landsat 5, Landsat 7, NLCDUnited States
Chen et al. (2017) [143]segmentationnon-local active contour algorithmGaofen-1, Google Earth, Landsat 8 OLI, SRTM DEMTibet
Wang et al. (2018) [144]classificationRFJRC Global Surface Water, Landsat 4 TM, Landsat 5 TM, Landsat 8 OLIChina
Lin et al. (2018) [145]regressionBRT, multiple linear regression, nonlinear general additive models, RFLandsat 5 TM, Landsat 7 ETM+United States
Griffin et al. (2018) [146]regressionmultiple linear regressionLandsat 5 TM, Landsat 7 ETM+, NASA GSFC Ozone Monitoring InstrumentCanada, Russia, United States
Isikdogan et al. (2019) [147] *segmentationDeepWaterMapv, DeepWaterMap, MNDWI, MLPLandsat 8Global
Fang et al. (2019) [148]regressionlinear regression, polynomial regressionChina Lake Dataset, China’s Ecosystem Assessment and Ecological Security Pattern Database, Global Lakes and Wetlands, Global Reservoir and Dam Database, HydroLakes, Hydroweb, JRC Global Surface Water, SRTM DEMChina
Fuentes et al. (2019) [149]regressionCARTJRC Global Surface Water, Landsat 5, LiDAR DTM, USGS National Elevation DatasetAustralia, United States
Markert et al. (2020) [150]segmentationBmax Otsu thresholding, Edge Otsu thresholdingMERIT DEM, PlanetScope, Sentinel-1 GRDMyanmar, Cambodia
Wang et al. (2020) [151] *classificationMNDWI, MSCNN, RFGoogle Earth, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLIChina
Peterson et al. (2020) [152]regressionDNN, ELR, MLR, SVRGREON, Landsat 8, Sentinel-2United States
Wang et al. (2020) [153]regressionCARTJRC Global Surface Water, Landsat 5, LiDAR DTM, USGS National Elevation DatasetAustralia, United States
Boothroyd et al. (2021) [154]classificationRivaMapLandsat 5 TM, Landsat 7 ETM+, Landsat 8 OLIPhilippines
Weber et al. (2020) [155]regressionmaximum likelihood, multiple linear regression, RF, SVMNAIP, National Hydrography Dataset, NLCD, National Wetland Inventory, Sentinel-2United States
Mayer et al. (2021) [156] *segmentationU-NetJRC Global Surface Water datasets, PlanetScope, Sentinel-1Cambodia
Li et al. (2021) [157]classificationNDWI, MNDWI, MuWI-R, Otsu thresholding, SVMSentinel-2Sri Lanka
Li and Niu (2022) [158]classificationRFALOS DSM, China Lake Dataset, China Wetlands Map, Google Earth, Global Reservoir and Dam Database, Global Surface Water, Sentinel-1, Sentinel-2, SRTM DEMChina
Table 6. Studies targeting wetland mapping from RS imagery using AI.
Table 6. Studies targeting wetland mapping from RS imagery using AI.
ReferencesMethodModel ComparisonRS Data TypeStudy Area
Hird et al. (2017) [35]classificationBRTLiDAR DTM, Sentinel-1, Sentinel-2Canada
Farda (2017) [159]classificationCART, Fast NB, GMO Max Entropy, IKPamir, MLP, Margin SVM, Pegasos, RF, Voting SVM, WinnowLandsat 3 MMS, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLI, ASTER GDEMIndonesia
Amani et al. (2019) [160]classification, segmentationRF, SNICLandsat 8Canada
Mahdianpari et al. (2018) [161]classificationRFSentinel-1, Sentinel-2Canada
DeLancey et al. (2019) [162]classificationBRTLiDAR DEM, Sentinel-1, Sentinel-2, SRTM DEMCanada
Wu et al. (2019) [163]classificationk-meansNAIP, JRC Global Surface Water datasets, LiDAR DEMs, National Wetlands Inventory (NWI)Canada, United States
Amani et al. (2019) [17]classificationRFCanadian DEM, Landsat 8, Sentinel-1Canada
Zhang et al. (2019) [164]classificationRFGoogle Earth Pro, Landsat 8 OLIChina
Mahdianpari et al. (2020) [165]classification, segmentationRF, SNICaerial photography, Google Earth, Sentinel-1, Sentinel-2Canada
Hakdaoui et al. (2020) [166]classificationRFASTER DEM, Landsat 5 TM, Sentinel-1 GRD, Sentinel-2 MSIMorocco
DeLancey et al. (2019) [167]classificationU-Net, XGBoostALOS DEM, Sentinel-1, Sentinel-2Canada
Mahdianpari et al. (2020) [168]classificationRFCanada’s Annual Crop Inventory, Google Earth, Pleiades, Sentinel-1, Sentinel-2, WorldView 2Canada
Wang et al. (2020) [169]classificationDTGoogle Earth, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLIChina
Mahdianpari et al. (2020 [170]classificationCART, MD, RFLandsat 5 TM, Landsat 7 ETM+, Landsat 8 OLICanada
Sahour et al. (2021) [171]classificationRF, SVMaerial photography, Google Earth, JRC Global Surface Water, Sentinel-1, Sentinel-2United States
Jia et al. (2021) [172]segmentationOtsu’s thresholding algorithmDJI Phantom 4 pro, Gaofen-2, Google Earth, Sentinel 2China
Table 7. Studies targeting infrastructure and building detection from RS imagery using AI (Note that references marked * denotes novel methods and is detailed in Section 3.3).
Table 7. Studies targeting infrastructure and building detection from RS imagery using AI (Note that references marked * denotes novel methods and is detailed in Section 3.3).
ReferencesMethodModel ComparisonRS Data TypeStudy Area
Goldblatt et al. (2016) [178]classificationCART, RF, SVMGoogle Earth, Landsat 7 ETM+, Landsat 8, WorldPopIndia
Huang et al. (2018) [179]classificationBRTGoogle Earth, Landsat 7 ETM+, Landsat 8 OLIChina
Xu et al. (2019) [180]classification, segmentationLandTrendr, RFFROM-GLC, GHSL, Google Earth, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLIChina
Zhong et al. (2019) [181]regressioncubic regressionGPP, GOME-2, Google Earth Pro, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLI, MOD09A1China
Lin et al. (2020) [182]classificationRFDMSP NTL, GHSL, GlobeLand30, Google Earth, Landsat 8, Sentinel-1, SRTM DEM, VIIRS NTLChina
Liu et al. (2020) [183]classification, segmentationCART, Otsu’s thresholding algorithm, RFGeo-Wiki, GHSL, GlobeLand30, Google Earth, Hansen Global Forest Change, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLI, OpenStreetMap, SRTM DEMChina
Mugiraneza et al. (2020) [184]classification, segmentationLandTrendr, SVMGoogle Earth, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLIRwanda
Lin et al. (2021) [185] *classificationCART, gmoMaxEnt, NB, RF, SVMLandsat 8 OLIChina
Carneiro et al. (2021) [186]classificationRFLandsat 5 TM, Landsat 7 ETM+, Landsat 8 OLI, Sentinel-2, SRTM DEMBrazil
Zhang et al. (2021) [187]classificationRFLandsat 8 OLIChina
Samat et al. (2022) [188]classificationSVMGCL-FCS30-2020, GHSL, Google Earth, Sentinel-2China
Table 8. Studies targeting wildfires from RS imagery using AI. (Note that references marked * denotes novel methods and will be detailed in Section 3.3).
Table 8. Studies targeting wildfires from RS imagery using AI. (Note that references marked * denotes novel methods and will be detailed in Section 3.3).
ReferencesMethodModel ComparisonRS Data TypeStudy Area
Parks et al. (2019) [189]regressionRFLandsat 4 TM, Landsat 5 TM, Landsat 7 ETM+, Landsat 8 OLICanada, United States
Quintero et al. (2019) [190]segmentationFormaTrend, LandTrendrLandsat 5 TM, Landsat ETM+, Landsat OLI, MCD64A1, SRTM DEMSpain
Long et al. (2019) [191] *classificationRF, SVMCBERS-4 MUX, FireCCI51, Gaofen-1 WFV, GFED4, Google Earth, MCD12C1, MOD44B, MTBS, Landsat-8Global
Bar et al. (2020) [192]classificationCART, RF, SVM, Weka clusteringFireCCI51, IRS 1C, Landsat 5, Landsat 8 OLI, MODIS, ResourceSat 2, Sentinel-2, VIIRSIndia
Sulova and Jokar Arsanjani (2020) [193]classificationCART, NB, RFCGLS-LC100, FIRMS, MOD13Q1, Sentinel-2, SRTM DEMAustralia
Zhang et al. (2020) [194]classificationRFLandsat 5Global
Seydi et al. (2021) [195]classificationKNN, RF, SVMLandsat 8, MODIS, Sentinel 2Australia
Arruda et al. (2021) [196]classificationDNNINPE, Landsat 8 OLI, MODISBrazil
Table 9. Studies targeting heavy industry and pollution from RS imagery using AI.
Table 9. Studies targeting heavy industry and pollution from RS imagery using AI.
ReferencesMethodModel ComparisonRS Data TypeStudy Area
Waller et al. (2018) [197]regressionRFDART, Google Earth Pro, Landsat 5 TM, NLCDUnited States
Lobo et al. (2018) [198]classification