Next Article in Journal
Laboratory Observations of Preferential Flow Paths in Snow Using Upward-Looking Polarimetric Radar and Hyperspectral Imaging
Next Article in Special Issue
Satellite Imagery to Map Topsoil Organic Carbon Content over Cultivated Areas: An Overview
Previous Article in Journal
Multi-Station and Multi-Instrument Observations of F-Region Irregularities in the Taiwan–Philippines Sector
Previous Article in Special Issue
Optimizing a Standard Spectral Measurement Protocol to Enhance the Quality of Soil Spectra: Exploration of Key Variables in Lab-Based VNIR-SWIR Spectral Measurement
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Scale-Specific Prediction of Topsoil Organic Carbon Contents Using Terrain Attributes and SCMaP Soil Reflectance Composites

1
Julius Kühn Institute (JKI), Federal Research Centre for Cultivated Plants, Institute for Crop and Soil Science, Bundesallee 58, 38116 Braunschweig, Germany
2
German Aerospace Center (DLR), German Remote Sensing Data Center (DFD), Münchener Str. 20, 82234 Wessling, Germany
3
Bavarian State Research Center for Agriculture, Institute for Organic Farming, Soil and Resource Management, Lange Point 6, 85354 Freising, Germany
4
German Aerospace Center (DLR), Remote Sensing Technology Institute (IMF), Münchener Str. 20, 82234 Wessling, Germany
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(10), 2295; https://doi.org/10.3390/rs14102295
Submission received: 14 March 2022 / Revised: 3 May 2022 / Accepted: 4 May 2022 / Published: 10 May 2022
(This article belongs to the Special Issue Remote Sensing for Soil Organic Carbon Mapping and Monitoring)

Abstract

:
There is a growing need for an area-wide knowledge of SOC contents in agricultural soils at the field scale for food security and monitoring long-term changes related to soil health and climate change. In Germany, SOC maps are mostly available with a spatial resolution of 250 m to 1 km 2 . The nationwide availability of both digital elevation models at various spatial resolutions and multi-temporal satellite imagery enables the derivation of multi-scale terrain attributes and (here: Landsat-based) multi-temporal soil reflectance composites (SRC) as explanatory variables. In the example of a Bavarian test of about 8000 km 2 , relations between 220 SOC content samples as well as different aggregation levels of the explanatory variables were analyzed for their scale-specific predictive power. The aggregation levels were generated by applying a region-growing segmentation procedure, and the SOC content prediction was realized by the Random Forest algorithm. In doing so, established approaches of (geographic) object-based image analysis (GEOBIA) and machine learning were combined. The modeling results revealed scale-specific differences. Compared to terrain attributes, the use of SRC parameters leads to a significant model improvement at field-related scale levels. The joint use of both terrain attributes and SRC parameters resulted in further model improvements. The best modeling variant is characterized by an accuracy of R 2 = 0.84 and RMSE = 1.99.

1. Introduction

Soil is the largest carbon sink on earth after the oceans and can store more than twice as much CO 2 as the atmosphere [1]. Therefore, the soil of agricultural ecosystems can contribute to the mitigation of greenhouse gas (GHG) emissions and thus to climate change mitigation through increased carbon sequestration [2]. In order to assess this potential and promote it through adaptation of land use systems, as well as to localize adaptation needs on an area-by-area basis in the context of the European Common Agricultural Policy (CAP) and the Sustainable Development Goals (SDGs), up-to-date, area-wide, and high-resolution information on carbon contents of agricultural soils is needed [3,4]. Germany-wide maps of the carbon content of agricultural soils are currently only available as static maps with a spatial resolution of 200 m 2 to 1 km 2 [5]. The maps are not suitable as a basis for small-scale field-specific analyses. In addition, the maps do not contain quality measures that are important for communicating model uncertainties [6,7].
Detailed information on carbon content is available in the form of point soil samples collected at the state, national or European level (e.g., [8,9,10]). The data sets differ in sampling methodology, frequency, and density as well as in their representativeness. Explanatory variables are needed for the operational transformation of point data into spatial data sets. Since the nationwide availability of digital elevation models at various spatial resolutions, digital soil mapping has transitioned from the research phase to operational use [11]. The increasing availability of multi-temporal satellite imagery allows an expansion of the data space to distinguish both spatial and temporal patterns of SOC content [4,12]. Multi-temporal soil reflectance composites (SRC), based on Landsat or Sentinel-2 time series, have proven as explanatory variables for the prediction of (top)soil organic carbon (SOC) content [5,13,14,15,16,17,18,19,20].
In this article, we deepen a study by Zepp et al. (2021), which applied different modeling methods on Landsat-based SRC data for SOC content prediction in Bavaria, Germany [5]. As a result, the Random Forest (RF) showed the best predictive capabilities in terms of model accuracy and performance. Using a sub-area within Bavaria as an example, we extend the modeling approach and compare the predictive single and mutual capabilities of Landsat-based SRCs and multi-scale terrain attributes. The latter should take into account the fact that soil properties and soil-forming processes are an expression of complex relationships between soil forming factors and landforms, which occur on different scales [21,22,23,24,25,26,27,28]. Multi-scale terrain attributes enable the consideration of contextual information, which can improve the prediction accuracy of soil properties [24]. In addition, different aggregation levels of the two parameter sets were generated. The used segmentation algorithm results in spatial objects with soil-related meaning [29,30], which are also referred as soil-terrain objects [6] or ecotopes [31]. They can be defined as groups of terrain attribute raster cells which are aggregated to landform elements according to a scale-specific homogeneity [29,32,33]. Their usage in digital soil mapping applications has been proven as superior compared to pixel-based approaches [30,34,35]. The main objective of this study is the analysis of relations between SOC content samples as well as different aggregation levels of terrain attributes and Landsat-based SRC data regarding their scale-specific predictive power.

2. Materials and Methods

Figure 1 illustrates the principle digital soil modeling workflow, in which scale-specific reference units (RU) with explanatory variables are related to soil measurements and analyzed with machine learning methods. The workflow can be distinguished into the two categories:
  • “Input data” comprises the provision of soil samples (Section 2.1.1) as well as the derivation of terrain attributes (Section 2.1.2) and multi-temporal SRCs (Section 2.1.3). By applying a segmentation algorithm (Section 2.1.4), both data types are used for generating multi-hierarchical reference units (RU), which are parameterized by applying zonal statistics operations (Section 2.1.5).
  • “Machine learning” refers to the actual spatial SOC content prediction by applying the Random Forest algorithm. In addition, an internal and independent validation schema, as well as a recursive feature elimination analysis, is included (Section 2.2).
The workflow was implemented using R functions [36] documented in a Github repository (https://github.com/FLFgit/ScaleP.git; accessed on 13 March 2022).

2.1. Input Data

2.1.1. Soil samples

The test area in Bavaria (Figure 2) was selected because the ranges of SOC content values are comparable with the entire area (Figure 3). The data set comprises soils with well-developed B horizons (mainly Cambisols), soils with initial soil formation (mainly Leptosols), soils with water stagnation (mainly Stagnosols and Planosols), soils with clay migration (mainly Luvisols), clay-rich soils (mainly Vertisols), groundwater soils (mainly Gleysols), and natural bogs and fens (mainly Histosols) according to the German soil systematic and the equivalent reference soil groups of the WRB system [37]. Both mineral soils with lower SOC contents and organic soils in the form of fens (e.g., Königsmoos) with higher SOC contents occur in the test area. In addition, the test area has a comparable heterogeneous terrain composition as in Bavaria.
Figure 2 shows the spatial distribution of available sampling sites throughout Bavaria ( N = 939 ) and in the test area of about 8000 km 2 ( N = 220 ) selected for modeling, with the size of the agricultural area accounting for about half. The available soil samples were provided by the Bavarian Environment Agency (LfU) and the Bavarian State Research Center for Agriculture (LfL). All databases were determined by dry combustion using elemental analyzers [8,9]. The final soil data set of the test site comprises 220 soil samples (LfL: 14 samples; LfU: 206 samples). The SOC contents range from 0.74% to 18.3% with a median content of 2.00 %. The comparison of both value distributions reveals that differences mainly occur in the value range between 3 and 10% (Figure 3). Applying the nonparametric Kolmogorov–Smirnov goodness-of-fit test [38] revealed that the empirical cumulative distributions are significantly different, with the D = 0.24 of the curves being rather small (cf., [6]).

2.1.2. Terrain Attributes

Terrain attributes have been established for decades as explanatory variables for predicting soil properties in general [11,39] and SOC content in particular [40]. Since the scale dependency of terrain attributes has an effect on SOC predictions (e.g., [23,26]), multi-scale terrain attributes have been derived (Table 1). This concerns in particular variants of attributes “Normalized Height” ( N H ; Figure 4a,b) and “Topographic Position Index” ( T P I ) [41,42], for whose calculation different moving window sizes were applied. The variants of the attributes “Vertical Distance above Channel Network” ( V D C ) and “Terrain Classification Index” ( T C I ) [41] are based on different aggregation levels of the channel network derivations. The “Mass Balance Index” ( M B I ) versions are expressions of the differentiability regarding dominant and subdominant relief shapes [21]. All multi-scale variants are based on tuning parameters listed in Table 1. Their definition is the result of exponential functions for which the start and end values were determined empirically (see https://github.com/FLFgit/ScaleP/blob/master/callScaleP.R; accessed on 13 March 2022). Finally, the one-dimensional attributes including local attributes (sink-filled digital elevation model ( F I L L ) and “Slope“ ( S L P )) as well as regional attributes (“Topographic Openness” ( T O P and T O N ) [43] and the “Topographic Wetness Index” ( T W I ; Figure 4c) [44]) were calculated [45].
The corresponding process chain is documented in an R function (Table 1). There, all terrain attribute variants are defined. In this study, a DEM with a resolution of 10 m was used, which is provided by the German Federal Agency of Cartography and Geodesy (https://gdz.bkg.bund.de; accessed on 13 March 2022). The DEM was resampled to 30 m according to the resolution of the SCMaP SRC data set (Section 2.1.3).

2.1.3. SCMaP-SRC

In addition to the terrain attributes, spectral information of a remote sensing soil reflectance composite (SRC) was used for SOC content modeling (Figure 4d). The Soil Composite Mapping Processing (SCMaP) chain enables the generation of SRC for individually selected regions and time periods [48]. Based on a modified vegetation index (PV) two thresholds are determined to separate predominantly uncovered soils from all other land cover types. The development of the database for the threshold derivation is automated. The threshold itself has been derived based on manually defined percentile measures [49].
A 30-year (1984–2014) compositing period was chosen to enable a smooth spectral database [5]. The compositing period was chosen according to the dates of soil sampling. The 30-year SRC was built on all Landsat (69 Landsat-4 TM, 1784 Landsat-5 ETM, and 998 Landsat-7 ETM+) collection scenes [50] available between 1984 and 2014 with a resolution of 30 m for the investigation area. For all scenes, the same pre-processing steps were applied. The FMask algorithm was used to detect and remove clouds, cloud shadows, and pixels covered by snow [51,52]. Additionally, an atmospheric correction was performed using the Atmospheric Correction (ATCOR) software for satellite imagery [53]. The reflectance soil composites show the averaged reflectance per-pixel composites for the observed time period of exposed soils. The patterns in the reflectance soil composite correspond to patterns of existing soil maps and the underlying geological structural region. Products therefore provide useful information on soils and exposed soil coverage. The resulting bands S R C 1 7 represent the “normal” averaged reflectances, for bands S R C 8 14 the averaged reflectances are normalized per scene by the albedo, which is calculated as mean reflectance over all six reflective bands.

2.1.4. Segmentation

Different aggregation levels of explanatory variables were derived by applying a region-growing segmentation algorithm that spatially groups grid cells of terrain attributes and SRC bands according to their neighborhood in a feature space and raster. As a result, polygons with scale-specific comparable heterogeneity are generated.
The region-growing algorithm “Fractal Net Evolution Approach” (FNEA) was applied [54], which is implemented in the software eCognition and has been shown to be suitable for detecting spatial objects of soil-related relevance [6,30,34,35] in the context of geographic object-based image analysis (GEOBIA) [55,56,57]. The algorithm relies on seed pixel groups with the smallest (here: Euclidean) distance in both the raster and the feature space of the parameters used. Then, the seeds grow until the user-specific heterogeneity of the raster values within the resulting objects is reached.
In this study, the segmentation input data were the variables T W I , S L P , T C I C A = 10,000 , M B I T = 0.0005 , N H t = 1000 as well as S R C 8 14 (cf., Table 1). The shape of the resulting objects is influenced by the user-defined parameters “shape variance” and “compactness”, which have been set to 0.05 and 0.01 here. The degree of object aggregation is controlled by the parameter “Scale Level” L (cf., [30]). The corresponding 17 scale level values are listed in Table 1. As an example of a test site subset, Figure 5a displays polygons of three scale levels. There, red-colored polygon boundaries represent parent polygons, which are decomposed by the smaller yellow-colored child and blue-colored grandchild polygons. The latter can be viewed as vectorized raster cells [21,30]. The white areas were identified by the SCMaP algorithm as areas of no or little reflectance changes. This applies, for example, to forests or built-up areas.

2.1.5. Parametrization

While the parametrization of scale-specific reference units (RU) is realized by applying zonal statistics functions [41,58], Zepp et al. (2021) applied a filter-based parameterization of the samples [5]. This involved averaging the spectra of the sample pixel and its eight neighboring pixels to reduce local and spatial variability. The background to this approach is that the SCMaP-SRC provides ground reflectance information at a pixel resolution of 30 m based on Landsat imagery. Linking point samples to a 30 m remote sensing pixel is a potential source of inaccuracy because not all samples are collected at least 30 m inside the fields. As a result, the SRC pixel may combine multiple surfaces with different spectral information. The ground sample is then combined with a mixture of spectral information. Therefore, spectral and spatial filtering was applied to the sample pixel and its eight neighboring pixels to evaluate spectral differences within a field cluster. All pixel clusters showing deviating spectra are excluded from further processing.

2.2. Machine Learning

Random Forests (RF) is a regression and ensemble-based decision tree algorithm [59], which has been regularly used for predicting soil properties [60,61]. RF divides the feature space of the explanatory variables until the resulting tree has the best statistical correlation by minimizing the variance. Based on bootstrap samples, RF generates a large number of independent trees (ensembles). Two-thirds of the samples are used to grow the trees (in-bag data), and one-third are drawn randomly to calculate error estimates through cross-validation (out-of-bag data).
To validate the classification results, the total data set is randomly divided into a training and test data set of 75% and 25%, respectively, taking into account the target parameter distribution. Model building is based on the training data set. On the basis of the training data set, a calibration and a repeated 5-fold cross-validation are performed. The test data set is used for independent validation [62], to which the trained model is applied. Modeling performance is evaluated using the metrics of “Root Mean Square Error” ( R M S E ) and “Coefficient of determination” ( R 2 ) for cross-validation, calibration, and independent validation. The S L O P E of the regression line indicates the degree of underestimation or overestimation.
All used explanatory variables are more or less affected by multicollinearity. This concerns in particular the terrain attribute variations. RF can be considered tolerant of this phenomenon regarding the model prediction or the accuracy of the model [61]. However, collinearity might impair the interpretability of the model and may lead to misidentification of relevant predictors [63]. This especially concerns the interpretation of the variable importance of each explanatory variable, which is derived from the percent increase in mean squared error ( M S E ) resulting from the permutation of the out-of-bag data for each variable [64]. To ensure the interpretability of relevant predictors, the recursive feature elimination approach (RFE) is used [65,66], where the least important predictors are iteratively eliminated before the model is rebuilt [64].

3. Results

3.1. Filter-Based Parametrization

Table 2 lists the accuracy metrics resulting from applying the modeling approach to the samples for Bavaria [5] and the test site based on the filter-based parameterization. Figure 6 visualizes the corresponding validation scatter plots. The accuracy metrics include all R M S E and R 2 values of calibration (CAL), cross-validation (CV), and (independent) validation (VAL) (Section 2.2).
The different accuracy metrics reflect the differences between the distributions of SOC content values, which is also indicated by the Kolmogorov–Smirnov distance between the entire Bavarian and the subset data set used in this study (Section 2.1.1). The differences mainly concern the R M S E C V , C A L , V A L and R V A L 2 values, with the metrics of the subarea being higher than those for all of Bavaria.

3.2. Scale-Specific Parametrization

Table 3 summarizes all scale-specific accuracy metrics. The naming of the aggregation levels follows the scale level specification within the eCognition software setting, which was varied during the application of the algorithm and controls the parameter heterogeneity and thus indirectly the number and sizes of the resulting polygons [21,30,67]. Figure 5b and Table 3 show the relation between object number O N and scale level L. From the size of the agricultural area (about 4000 km 2 ) and the number of objects, we find approximately the average object size, which varies depending on the heterogeneity of the soil landscape [6,21].
The CAL, CV, and VAL-related accuracy metrics are derived for all scale levels and the parameter variants “terrain attributes” (A), “SCMaP-SRC” (B), and “terrain attributes + SCMaP-SRC” (C). The CAL and VAL value differences reflect the degree of model over-fitting [68], which correlates with the VAL values. It is also noticeable that the R C V 2 values are often considerably smaller than R V A L 2 values, which were also observed by Zepp et al. (2021) [5].
Figure 7 reveals scale-specific dependencies. Accordingly, the variation range of R A , V A L 2 values (0.43 to 0.62) is smaller than of R B , V A L 2 (0.47 to 0.84) and R C , V A L 2 values (0.43 to 0.84). The R M S E V A L values vary between 2.78 and 3.47 (A), 2.00 and 3.28 (B), and 1.99 and 3.34 (C). With R B , V A L 2 = 0.65 and R S M E B , V A L = 4.75 , the highest R A , V A L 2 value is associated with scale level L = 6 .
The use of SCMaP-SRC parameters leads to a significant model improvement. Two main scale ranges can be distinguished, with scale level 3 as the boundary. The highest R B , V A L 2 value is found at scale level L = 0.6 . The joint use of terrain attributes and SCMaP-SRC parameters leads to further slight model improvements. At the scale level L = 0.3 , which corresponds to the original grid resolution of 30 m 2 , the accuracy metrics again drop significantly. The R M S E V A L and 1 S L O P E V A L values display similar characteristics with opposite minima and maxima.
Each scale level is characterized by scatterplots of the predicted and observed SOC values with respect to the test data. Figure 8 illustrates this for all three parametrization variants and the scale levels L = 0.6 and L = 6 , which can be considered as representatives of both scale ranges. Besides the measures R 2 and R M S E , the slope of the regression line ( S L O P E ) as an indicator for model over- or underestimation is displayed. According to Figure 7, all models lead to a SOC underestimation. The degree of underestimation is the lowest for the SRC-based models in the scale range between L = 0.4 and L = 1 . It is also noticeable that the pure SRC models show better results than the mixed TA and SRC models. Both effects can be observed in Figure 8.
The most accurate modeling variant regarding R 2 and R M S E with “scale level L = 0.6 , parametrization variant C” (Figure 8f) is mapped in Figure 9. The SOC content pattern reflects the soil and terrain landscape structure with higher SOC values in lowlands and lower SOC values in hilly regions (cf., Figure 4). This is also true for the prediction “scale level L = 6 , parametrization variant A” (Figure 8a), which makes the main landscape structures visible. However, a visual comparison of both variants shows more detailed differentiation and higher spatial variability of predicted SOC contents for the L = 0.6 variant, which is particularly pronounced in the lowlands. This is also where the greatest differences in the value distributions between the two variants can be observed, which lie roughly in the value range between 4 and 8% SOC content (Figure 10). Furthermore, a comparison of the two SOC content distributions with the distribution of the training data set shows that the L = 6 prediction variant deviates more than the L = 0.6 variant, as shown by the Kolmogorov–Smirnov (KS) distances of the corresponding empirical cumulative distribution functions (ECDF) (cf., [6]).
Table 4 summarizes the recursive feature elimination results. There, the most relevant parameters, which lead to minimal R M S E values, are listed. In addition to the one-dimensional terrain features F I L L , S L P , T O P , and T O N , variants of the multi-hierarchical terrain features N H and (subordinately) V D C have the greatest influence on the modeling results. As for the SRC attributes, in particular the normalized multi-temporal Landsat-band 5 ( S R C 12 ) as well as the bands 2 ( S R C 2 , 9 ) and 3 ( S R C 3 , 10 ) are important. The analysis of the combined parameterization variants reveals the dominance of the SRC attributes at all scale levels. Figure 11 visualizes the example of two modeling variants that, with only a few attributes, lead to a significant reduction of R M S E values.

4. Discussion

4.1. Data Quality and Fitness-for-Use

Approved data quality is a prerequisite for the reusability of data. The SOC content modeling results presented are examples of standardized soil mapping products. Standardization refers to reproducible data processing and modeling, as well as their evaluation based on accuracy metrics [6,69,70,71]. In contrast to static conventional soil maps, the scale-specific suitability can be determined, which helps to communicate map quality to end-users [71,72,73], to provide additional information about data fitness-for-use [74,75], to improve the model’s interpretability [4] as well as to get a additional geospatial provenance description [76]. Although the process chain presented is reproducible, individual steps are based on expert knowledge. This concerns in particular the selection of segmentation parameters (Section 2.1.4) and terrain attributes as well as their multi-scale tuning parameters (Table 4). Here, further research is needed to define statistically sound parameters [25,67].
The validation scheme in this study follows the approach of Zepp et al. (2021) [5] and the recommendations of Piikki et al. (2021) [73] including data splitting, cross-validation, and independent validation, as well as the use of different types of accuracy metrics. The latter were primarily used to compare different scale-specific parameterization variants. Maps for practical use should also contain uncertainty metrics, which estimate the prediction variation for every raster cell. Geostatistical metrics or prediction intervals (e.g., [18,61,77,78,79]) are widely used.

4.2. Scale-Specific Optimization

The SOC content map quality is affected by factors such as the spatial and temporal representativeness of the samples or the scale-specific explanatory power of the variables used. Following the effective map scale (EMS) approach [30], each scale-specific map is characterized by its “predictive efficiency” [33]. The underlying workflow can be considered as a procedure where the relationship between SOC content samples and different aggregation levels of multi-scale terrain attributes and SCMaP-SRCs is statistically optimized. This is also evident in the comparison of the modeling results based on the filter-specific parameterization (Section 3.1), which represents a static window-based aggregation procedure. In contrast to changes in grid resolution of terrain attributes [26,28], the segmentation-based aggregation considers both parameter-specific and spatial data variability. In this way, a more precise delineation of the reference units can be made. This is relevant, for example, for samples taken at field boundaries [5]. This means that the optimization can counteract possible positional inaccuracies of the samples [27,80].
The accuracy measure R 2 of the best modeling SCMaP-SRC variant ( L = 0.6 , variant B with R 2 = 0.81 ; Table 3) exceeds the result of Zepp et al. (2021) ( R 2 = 0.74 ; Table 2) [5], whereas the R M S E values are the same for both models ( R M S E = 2.11 ). It can be assumed that the accuracy measures of the SCMaP-SRC variant L = 0.3 (variant B), which corresponds to the original raster resolution, are exceeded by both variants of the filter- and scale-specific parameterization models. The additional consideration of terrain attributes leads to a further model improvement regarding both accuracy measures ( L = 0.6 , variant C with R 2 = 0.84 and R M S E = 1.99 ; Table 3).
The prediction results made a jump in scale visible (cf., Figure 7). They refer to concepts of hierarchical landscape structuring, according to which (here: soil-relevant) processes and states are associated with specific scale ranges [30,81,82]. Accordingly, SCMaP-SRC-related accuracy measures in particular show significant differences around the L = 3 level, with almost the same spectral bands having the highest impact on predictions at all scale levels. Compared to terrain attributes, SCMaP-SRC parameters are also characterized by a higher explanatory power at fine scales, especially below scale level L = 3 .
TA-related accuracy measures display a smaller and more balanced variation across scale levels. One reason might be that various expressions of terrain attributes have been used as explanatory variables. They represent variations regarding scale or terrain complexity. This means that in addition to scale optimization, the terrain attribute variations also serve as optimization variables [30]. Of the multi-scale terrain attributes, the N H and the V D C variants are particularly relevant at different scale levels, which Guo et al. (2019) also consider as key attributes that influence SOC distribution [26]. While below scale level L = 3 the multi-scale attribute variant N H 4 dominates, above scale level L = 3 other N H or V D C variants appear. This underlines the scale dependence of the soil-related processes, for which scale-specific parameters have to be identified as optimal for prediction [28,83,84]. It is finally noticeable that the one-dimensional attribute F I L L has the highest explanatory power. Other one-dimensional attributes of high relevance are S L P and T O N / T O P .
From a machine learning perspective, all used explanatory variables represent “handcrafted” features whose selection is based on domain or expert knowledge [85]. This mainly concerns the determination of multi-scale tuning parameters (cf., Table 1) and scale levels (see Section 2.1.4). Although reproducible, there is potential for unsupervised and statistically driven approaches for the derivation of the parameters (e.g., [56,86]). This is also true for object-based contextual parameters [56], which have not been considered in this study.

4.3. SCMaP-SRC as Additional Input for SOC Modeling

The results shown in Section 3.2 indicate an increase in the SOC model performances using SCMaP-SRC data in addition to terrain attributes (see Figure 7). Though the R 2 values (both model calibration and validation) for TA and TA+SRC point to high model performances, the R M S E are relatively high (>1.99). The federal state of Bavaria shows a wide range of SOC contents, as mineral and organic soils occur. As the focus of the subset definition was on the selection of a representative subarea of the entire federal state, a possible wide range of SOC contents was included here. Hence, the high R M S E could be related to the wide range of SOC contents in the study area. Relatively high R M S E values for SOC modeling in Bavaria were also reported by Zepp et al. (2021) [5]. According to the results of the recursive feature elimination shown in Table 4, the most important SRC attributes are bands 2 and 3 which are selected over all different scale levels. Zepp et al. (2021) also showed the importance of the bands 2 and 3 for the SOC modeling [5]. Additionally, band 12 (band 5 normalized) is of high importance for SOC modeling.
To investigate the influence of the combination of terrain attributes and the SCMaP-SRC information, the same remote sensing database as shown in Zepp et al. (2021) [5] was used. The SOC modeling was performed using a spatial subset of the 30-year SCMaP SRC data. The 30-year compositing period enables stable conditions and mainly includes permanent spatial soil moisture differences, related to soil texture or type characteristics. Influencing factors as varying short-term soil moisture differences thus have a lower effect compared to analysis based on shorter compositing lengths or single scenes. However, an analysis of this assumption is still necessary. Additionally, a long compositing period enables the integration of a high number of cloudless scenes, which is accountable for a reliable data source [49]. Here, the 30-year period was applicable, as, among others shown by Kühnel et al. (2020) [87], SOC contents are constant for Bavaria. Based on permanent observation sites, no to low SOC changes were observed between 1984 and 2016. However, the use of a 30-year composite could hamper the SOC prediction if the investigation area includes short-term SOC changes or changes over several years. For the transferability of the shown modeling techniques to other areas with temporally higher SOC changes, shorter compositing periods have to be considered. In addition, an investigation of the impacts of political regulations (e.g., carbon farming [88] or denser modeling of soil health and status) would be enabled. The integration of Sentinel-2 data can potentially shorten the compositing time length, as the twin satellites provide a huge amount of data based on the combined revisit time of fewer than five days [89,90]. Additionally, the global available Harmonized Landsat Sentinel-2 (HSL) surface reflectance data set [91] can be considered. Both harmonized data sets are based on the same pre-processing schemes, enabling the data set as a highly valuable input regarding the large number of available scenes for the compositing approach.

5. Conclusions

In this study, approaches of multi-scale feature engineering, geographic object image analysis (GEOBIA), and machine learning have been coupled to a workflow where relations between SOC content samples as well as different aggregation levels of multi-scale terrain attributes and multi-temporal soil reflectance composites are optimized. The main findings of the study can be summarized as follows:
  • There are scale-specific dependencies between the representativeness of the soil samples and the explanatory power of the variables used.
  • Compared to terrain attributes, parameters based on multi-temporal soil reflectance composites are characterized by a higher explanatory power at fine scales.
  • The explanatory power of terrain attributes is generally smaller but more balanced across scale levels.
  • The best modeling variant is characterized by an accuracy of R 2 = 0.84 and R M S E = 1.99 , which outperforms modeling results based on a static window-based aggregation procedure with R 2 = 0.74 and R M S E = 2.11 .
  • The study results suggest that DSM workflows should include scale-related optimizations.

Author Contributions

Conceptualization, M.M. and U.H.; methodology, M.M. and S.Z.; software, M.M.; validation, M.M. and M.W.; formal analysis, M.M.; investigation, M.M.; data curation, M.M., S.Z. and M.W.; writing—original draft preparation, M.M. and S.Z.; visualization, M.M.; supervision, M.M., U.H. and H.G.; project administration, H.G.; funding acquisition, U.H. and H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the German Federal Ministry of Food and Agriculture (BMEL), grant number 281B301816 as part of the Soil-DE project “Entwicklung von Indikatoren zur Bewertung der Ertragsfähigkeit, Nutzungsintensität und Vulnerabilität landwirtschaftlich genutzter Böden in Deutschland”.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the Bavarian agencies, the Bavarian Environment Agency (LfU) and the Bavarian State Research Center for Agriculture (LfL) for providing the soil databases.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Minasny, B.; Malone, B.P.; McBratney, A.B.; Angers, D.A.; Arrouays, D.; Chambers, A.; Chaplot, V.; Chen, Z.S.; Cheng, K.; Das, B.S.; et al. Soil carbon 4 per mille. Geoderma 2017, 292, 59–86. [Google Scholar] [CrossRef]
  2. Lorenz, K.; Lal, R. Soil Organic Carbon—An Appropriate Indicator to Monitor Trends of Land and Soil Degradation within the SDG Framework? Number 77/2016 in UBA-Texte; Umweltbundesamt: Dessau-Roßlau, Germany, 2016. [Google Scholar]
  3. Prechtel, A.; von Lützow, M.; Uwe Schneider, B.; Bens, O.; Bannick, C.G.; Kögel-Knabner, I.; Hüttl, R.F. Organic carbon in soils of Germany: Status quo and the need for new data to evaluate potentials and trends of soil carbon sequestration. J. Plant Nutr. Soil Sci. 2009, 172, 601–614. [Google Scholar] [CrossRef]
  4. Tziolas, N.; Tsakiridis, N.; Chabrillat, S.; Demattê, J.A.M.; Ben-Dor, E.; Gholizadeh, A.; Zalidis, G.; van Wesemael, B. Earth observation data-driven cropland soil monitoring: A review. Remote Sens. 2021, 13, 4439. [Google Scholar] [CrossRef]
  5. Zepp, S.; Heiden, U.; Bachmann, M.; Wiesmeier, M.; Steininger, M.; van Wesemael, B. Estimation of soil organic carbon contents in croplands of Bavaria from SCMaP soil reflectance composites. Remote Sens. 2021, 13, 3141. [Google Scholar] [CrossRef]
  6. Möller, M.; Koschitzki, T.; Hartmann, K.J.; Jahn, R. Plausibility test of conceptual soil maps using relief parameters. CATENA 2012, 88, 57–67. [Google Scholar] [CrossRef]
  7. Lokers, R.; Knapen, R.; Janssen, S.; van Randen, Y.; Jansen, J. Analysis of Big Data technologies for use in agro-environmental science. Environ. Model. Softw. 2016, 84, 494–504. [Google Scholar] [CrossRef] [Green Version]
  8. Wiesmeier, M.; Spörlein, P.; Geuß, U.; Hangen, E.; Haug, S.; Reischl, A.; Schilling, B.; Lützow, M.; Kögel-Knabner, I. Soil organic carbon stocks in southeast Germany (Bavaria) as affected by land use, soil type and sampling depth. Glob. Chang. Biol. 2012, 18, 2233–2245. [Google Scholar] [CrossRef]
  9. Orgiazzi, A.; Ballabio, C.; Panagos, P.; Jones, A.; Fernández-Ugalde, O. LUCAS Soil, the largest expandable soil dataset for Europe: A review. Eur. J. Soil Sci. 2018, 69, 140–153. [Google Scholar] [CrossRef] [Green Version]
  10. Poeplau, C.; Jacobs, A.; Don, A.; Vos, C.; Schneider, F.; Wittnebel, M.; Tiemeyer, B.; Heidkamp, A.; Prietz, R.; Flessa, H. Stocks of organic carbon in German agricultural soils—Key results of the first comprehensive inventory. J. Plant Nutr. Soil Sci. 2020, 183, 665–681. [Google Scholar] [CrossRef]
  11. Minasny, B.; McBratney, A. Digital soil mapping: A brief history and some lessons. Geoderma 2016, 264, 301–311. [Google Scholar] [CrossRef]
  12. Lausch, A.; Baade, J.; Bannehr, L.; Borg, E.; Bumberger, J.; Chabrilliat, S.; Dietrich, P.; Gerighausen, H.; Glässer, C.; Hacker, J.; et al. Linking remote sensing and geodiversity and their traits relevant to biodiversity—Part I: Soil characteristics. Remote Sens. 2019, 11, 2356. [Google Scholar] [CrossRef] [Green Version]
  13. Demattê, J.A.M.; Safanelli, J.L.; Poppiel, R.R.; Rizzo, R.; Silvero, N.E.Q.; Mendes, W.d.S.; Bonfatti, B.R.; Dotto, A.C.; Salazar, D.F.U.; Mello, F.A.d.O.; et al. Bare earth’s surface spectra as a proxy for soil resource monitoring. Sci. Rep. 2020, 10, 4461. [Google Scholar] [CrossRef] [PubMed]
  14. Dvorakova, K.; Heiden, U.; van Wesemael, B. Sentinel-2 exposed soil composite for soil organic carbon prediction. Remote Sens. 2021, 13, 1791. [Google Scholar] [CrossRef]
  15. Mello, F.A.; Bellinaso, H.; Mello, D.C.; Safanelli, J.L.; Mendes, W.D.S.; Amorim, M.T.; Gomez, A.M.; Poppiel, R.R.; Silvero, N.E.; Gholizadeh, A.; et al. Soil parent material prediction through satellite multispectral analysis on a regional scale at the Western Paulista Plateau, Brazil. Geoderma Reg. 2021, 26, e00412. [Google Scholar] [CrossRef]
  16. Silvero, N.E.Q.; Demattê, J.A.M.; Amorim, M.T.A.; Santos, N.V.d.; Rizzo, R.; Safanelli, J.L.; Poppiel, R.R.; Mendes, W.d.S.; Bonfatti, B.R. Soil variability and quantification based on Sentinel-2 and Landsat-8 bare soil images: A comparison. Remote Sens. Environ. 2021, 252, 112117. [Google Scholar] [CrossRef]
  17. Vaudour, E.; Gomez, C.; Lagacherie, P.; Loiseau, T.; Baghdadi, N.; Urbina-Salazar, D.; Loubet, B.; Arrouays, D. Temporal mosaicking approaches of Sentinel-2 images for extending topsoil organic carbon content mapping in croplands. Int. J. Appl. Earth Obs. Geoinf. 2021, 96, 102277. [Google Scholar] [CrossRef]
  18. Žížala, D.; Minařík, R.; Skála, J.; Beitlerová, H.; Juřicová, A.; Reyes Rojas, J.; Penížek, V.; Zádorová, T. High-resolution agriculture soil property maps from digital soil mapping methods, Czech Republic. CATENA 2022, 212, 106024. [Google Scholar] [CrossRef]
  19. Luo, C.; Wang, Y.; Zhang, X.; Zhang, W.; Liu, H. Spatial prediction of soil organic matter content using multiyear synthetic images and partitioning algorithms. CATENA 2022, 211, 106023. [Google Scholar] [CrossRef]
  20. Luo, C.; Zhang, X.; Wang, Y.; Men, Z.; Liu, H. Regional soil organic matter mapping models based on the optimal time window, feature selection algorithm and Google Earth Engine. Soil Tillage Res. 2022, 219, 105325. [Google Scholar] [CrossRef]
  21. Möller, M.; Volk, M.; Friedrich, K.; Lymburner, L. Placing soil-genesis and transport processes into a landscape context: A multiscale terrain-analysis approach. J. Plant Nutr. Soil Sci. 2008, 171, 419–430. [Google Scholar] [CrossRef]
  22. Behrens, T.; Schmidt, K.; Zhu, A.X.; Scholten, T. The ConMap approach for terrain-based digital soil mapping. Eur. J. Soil Sci. 2010, 61, 133–143. [Google Scholar] [CrossRef]
  23. Deumlich, D.; Schmidt, R.; Sommer, M. A multiscale soil-landform relationship in the glacial-drift area based on digital terrain analysis and soil attributes. J. Plant Nutr. Soil Sci. 2010, 173, 843–851. [Google Scholar] [CrossRef]
  24. Behrens, T.; Schmidt, K.; Ramirez-Lopez, L.; Gallant, J.; Zhu, A.X.; Scholten, T. Hyper-scale digital soil mapping and soil formation analysis. Geoderma 2014, 213, 578–588. [Google Scholar] [CrossRef]
  25. Behrens, T.; Schmidt, K.; MacMillan, R.A.; Viscarra Rossel, R.A. Multi-scale digital soil mapping with deep learning. Sci. Rep. 2018, 8, 15244. [Google Scholar] [CrossRef] [Green Version]
  26. Guo, Z.; Adhikari, K.; Chellasamy, M.; Greve, M.B.; Owens, P.R.; Greve, M.H. Selection of terrain attributes and its scale dependency on soil organic carbon prediction. Geoderma 2019, 340, 303–312. [Google Scholar] [CrossRef]
  27. Wadoux, A.M.J.C.; Padarian, J.; Minasny, B. Multi-source data integration for soil mapping using deep learning. SOIL 2019, 5, 107–119. [Google Scholar] [CrossRef] [Green Version]
  28. Dornik, A.; Cheţan, M.A.; Drăguţ, L.; Dicu, D.D.; Iliuţă, A. Optimal scaling of predictors for digital mapping of soil properties. Geoderma 2022, 405, 115453. [Google Scholar] [CrossRef]
  29. Drăguţ, L.; Eisank, C. Object representations at multiple scales from digital elevation models. Geomorphology 2011, 129, 183–189. [Google Scholar] [CrossRef] [Green Version]
  30. Möller, M.; Volk, M. Effective map scales for soil transport processes and related process domains—Statistical and spatial characterization of their scale-specific inaccuracies. Geoderma 2015, 247–248, 151–160. [Google Scholar] [CrossRef]
  31. Radoux, J.; Bourdouxhe, A.; Coos, W.; Dufrêne, M.; Defourny, P. Improving ecotope segmentation by combining topographic and spectral data. Remote Sens. 2019, 11, 354. [Google Scholar] [CrossRef] [Green Version]
  32. Minár, J.; Evans, I.S. Elementary forms for land surface segmentation: The theoretical basis of terrain analysis and geomorphological mapping. Geomorphology 2008, 95, 236–259. [Google Scholar] [CrossRef]
  33. MacMillan, R.; Shary, P. Landforms and landform elements in geomorphometry. In Geomorphometry: Concepts, Software, Applications; Developments in Soil Science; Elsevier: Amsterdam, The Netherlands, 2009; Volume 33, pp. 227–254. [Google Scholar] [CrossRef]
  34. Dornik, A.; Drăguţ, L.; Urdea, P. Classification of Soil Types Using Geographic Object-Based Image Analysis and Random Forests. Pedosphere 2018, 28, 913–925. [Google Scholar] [CrossRef]
  35. Coelho, F.F.; Giasson, E.; Campos, A.R.; de Oliveira e Silva, R.G.P.; Costa, J.J.F. Geographic object-based image analysis and artificial neural networks for digital soil mapping. CATENA 2021, 206, 105568. [Google Scholar] [CrossRef]
  36. Team, R.C. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021. [Google Scholar]
  37. IUSS Working Group WRB. World Reference Base for Soil Resources 2014: International Soil Classification System for Naming Soils and Creating Legends for Soil Maps; Food and Agriculture Organization (FAO) of the United Nations: Rome, Italy, 2014. [Google Scholar]
  38. Thas, O. Comparing Distributions; Springer Series in Statistics; Springer: New York, NY, USA; London, UK, 2010. [Google Scholar]
  39. McBratney, A.; Mendonca Santos, M.; Minasny, B. On digital soil mapping. Geoderma 2003, 117, 3–52. [Google Scholar] [CrossRef]
  40. Lamichhane, S.; Kumar, L.; Wilson, B. Digital soil mapping algorithms and covariates for soil organic carbon mapping and their implications: A review. Geoderma 2019, 352, 395–413. [Google Scholar] [CrossRef]
  41. Conrad, O.; Bechtel, B.; Bock, M.; Dietrich, H.; Fischer, E.; Gerlitz, L.; Wehberg, J.; Wichmann, V.; Böhner, J. System for Automated Geoscientific Analyses (SAGA) v. 2.1.4. Geosci. Model Dev. 2015, 8, 1991–2007. [Google Scholar] [CrossRef] [Green Version]
  42. Guisan, A.; Weiss, S.B.; Weiss, A.D. GLM versus CCA spatial modeling of plant species distribution. Plant Ecol. 1999, 143, 107–122. [Google Scholar] [CrossRef]
  43. Yokoyama, R.; Shlrasawa, M.; Pike, R.J. Visualizing topography by openness: A new application of image processing to digital elevation models. Photogramm. Eng. Remote Sens. 2002, 68, 257–265. [Google Scholar]
  44. Beven, K.J.; Kirkby, M.J. A physically based, variable contributing area model of basin hydrology/Un modèle à base physique de zone d’appel variable de l’hydrologie du bassin versant. Hydrol. Sci. Bull. 1979, 24, 43–69. [Google Scholar] [CrossRef] [Green Version]
  45. Olaya, V. Chapter 6 Basic Land-Surface Parameters. In Developments in Soil Science; Elsevier: Amsterdam, The Netherlands, 2009; Volume 33, pp. 141–169. [Google Scholar] [CrossRef]
  46. Planchon, O.; Darboux, F. A fast, simple and versatile algorithm to fill the depressions of digital elevation models. CATENA 2002, 46, 159–176. [Google Scholar] [CrossRef]
  47. Zevenbergen, L.W.; Thorne, C.R. Quantitative analysis of land surface topography. Earth Surf. Process. Landf. 1987, 12, 47–56. [Google Scholar] [CrossRef]
  48. Rogge, D.; Bauer, A.; Zeidler, J.; Mueller, A.; Esch, T.; Heiden, U. Building an exposed soil composite processor (SCMaP) for mapping spatial and temporal characteristics of soils with Landsat imagery (1984–2014). Remote Sens. Environ. 2018, 205, 1–17. [Google Scholar] [CrossRef] [Green Version]
  49. Zepp, S.; Jilge, M.; Metz-Marconcini, A.; Heiden, U. The influence of vegetation index thresholding on EO-based assessments of exposed soil masks in Germany between 1984 and 2019. ISPRS J. Photogramm. Remote Sens. 2021, 178, 366–381. [Google Scholar] [CrossRef]
  50. Wulder, M.A.; Loveland, T.R.; Roy, D.P.; Crawford, C.J.; Masek, J.G.; Woodcock, C.E.; Allen, R.G.; Anderson, M.C.; Belward, A.S.; Cohen, W.B.; et al. Current status of Landsat program, science, and applications. Remote Sens. Environ. 2019, 225, 127–147. [Google Scholar] [CrossRef]
  51. Zhu, Z.; Woodcock, C.E. Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sens. Environ. 2012, 118, 83–94. [Google Scholar] [CrossRef]
  52. Zhu, Z.; Wang, S.; Woodcock, C.E. Improvement and expansion of the Fmask algorithm: Cloud, cloud shadow, and snow detection for Landsats 4–7, 8, and Sentinel 2 images. Remote Sens. Environ. 2015, 159, 269–277. [Google Scholar] [CrossRef]
  53. Richter, R.; Schläpfer, D. Atmospheric/Topographic Correction for Satellite Imagery/ATCOR-2/3 User Guide; Technical Report Version 8.3.1; ReSe Applications: Wil, Switzerland, 2014. [Google Scholar]
  54. Benz, U.C.; Hofmann, P.; Willhauck, G.; Lingenfelder, I.; Heynen, M. Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information. ISPRS J. Photogramm. Remote Sens. 2004, 58, 239–258. [Google Scholar] [CrossRef]
  55. Blaschke, T.; Hay, G.J.; Kelly, M.; Lang, S.; Hofmann, P.; Addink, E.; Queiroz Feitosa, R.; van der Meer, F.; van der Werff, H.; van Coillie, F.; et al. Geographic Object-Based Image Analysis—Towards a new paradigm. ISPRS J. Photogramm. Remote Sens. 2014, 87, 180–191. [Google Scholar] [CrossRef] [Green Version]
  56. Chen, G.; Weng, Q.; Hay, G.J.; He, Y. Geographic object-based image analysis (GEOBIA): Emerging trends and future opportunities. GIScience Remote Sens. 2018, 55, 159–182. [Google Scholar] [CrossRef]
  57. Johnson, B.A.; Ma, L. Image segmentation and object-based image analysis for environmental monitoring: Recent areas of interest, researchers’ views on the future priorities. Remote Sens. 2020, 12, 1772. [Google Scholar] [CrossRef]
  58. Baston, D. Exactextractr: Fast Extraction from Raster Datasets Using Polygons. 2021. Available online: https://cran.r-project.org/web/packages/exactextractr/ (accessed on 13 March 2022).
  59. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  60. Zhang, G.l.; Liu, F.; Song, X.d. Recent progress and future prospect of digital soil mapping: A review. J. Integr. Agric. 2017, 16, 2871–2885. [Google Scholar] [CrossRef]
  61. Hengl, T.; Nussbaum, M.; Wright, M.N.; Heuvelink, G.B.; Gräler, B. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ 2018, 6, e5518. [Google Scholar] [CrossRef] [Green Version]
  62. Khaledian, Y.; Miller, B.A. Selecting appropriate machine learning methods for digital soil mapping. Appl. Math. Model. 2020, 81, 401–418. [Google Scholar] [CrossRef]
  63. Dormann, C.F.; Elith, J.; Bacher, S.; Buchmann, C.; Carl, G.; Carré, G.; Marquéz, J.R.G.; Gruber, B.; Lafourcade, B.; Leitão, P.J.; et al. Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography 2013, 36, 27–46. [Google Scholar] [CrossRef]
  64. Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
  65. Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
  66. Svetnik, V.; Liaw, A.; Tong, C.; Wang, T. Application of breiman’s random forest to modeling structure-activity relationships of pharmaceutical molecules. In Multiple Classifier Systems; Kanade, T., Kittler, J., Kleinberg, J.M., Mattern, F., Mitchell, J.C., Naor, M., Nierstrasz, O., Pandu Rangan, C., Steffen, B., Sudan, M., et al., Eds.; Series Title: Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3077, pp. 334–343. [Google Scholar] [CrossRef]
  67. Drǎguţ, L.; Tiede, D.; Levick, S.R. ESP: A tool to estimate scale parameter for multiresolution image segmentation of remotely sensed data. Int. J. Geogr. Inf. Sci. 2010, 24, 859–871. [Google Scholar] [CrossRef]
  68. Kuhn, M.; Johnson, K. Feature Engineering and Selection: A Practical Approach for Predictive Models, 1st ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar] [CrossRef]
  69. MacMillan, R. Experiences with Applied DSM: Protocol, Availability, Quality and Capacity Building. In Digital Soil Mapping with Limited Data; Hartemink, A.E., McBratney, A., Mendonça-Santos, M.d.L., Eds.; Springer: Dordrecht, The Netherlands, 2008; pp. 113–135. [Google Scholar] [CrossRef]
  70. Arrouays, D.; McBratney, A.; Bouma, J.; Libohova, Z.; Richer-de Forges, A.C.; Morgan, C.L.; Roudier, P.; Poggio, L.; Mulder, V.L. Impressions of digital soil maps: The good, the not so good, and making them ever better. Geoderma Reg. 2020, 20, e00255. [Google Scholar] [CrossRef]
  71. Kidd, D.; Searle, R.; Grundy, M.; McBratney, A.; Robinson, N.; O’Brien, L.; Zund, P.; Arrouays, D.; Thomas, M.; Padarian, J.; et al. Operationalising digital soil mapping—Lessons from Australia. Geoderma Reg. 2020, 23, e00335. [Google Scholar] [CrossRef]
  72. Hengl, T.; MacMillan, R.A. Predictive Soil Mapping with R; OpenGeoHub Foundation: Wageningen, The Netherlands, 2019. [Google Scholar]
  73. Piikki, K.; Wetterlind, J.; Söderström, M.; Stenberg, B. Perspectives on validation in digital soil mapping of continuous attributes—A review. Soil Use Manag. 2021, 37, 7–21. [Google Scholar] [CrossRef]
  74. Wentz, E.A.; Shimizu, M. Measuring spatial data fitness-for-use through multiple criteria decision making. Ann. Am. Assoc. Geogr. 2018, 108, 1150–1167. [Google Scholar] [CrossRef]
  75. Höck, H.; Toussaint, F.; Thiemann, H. Fitness for Use of Data Objects Described with Quality Maturity Matrix at Different Phases of Data Production. Data Sci. J. 2020, 19, 45. [Google Scholar] [CrossRef]
  76. Closa, G.; Masó, J.; Zabala, A.; Pesquer, L.; Pons, X. A provenance metadata model integrating ISO geospatial lineage and the OGC WPS: Conceptual model and implementation. Trans. GIS 2019, 23, 1102–1124. [Google Scholar] [CrossRef] [Green Version]
  77. Vaysse, K.; Lagacherie, P. Using quantile regression forest to estimate uncertainty of digital soil mapping products. Geoderma 2017, 291, 55–64. [Google Scholar] [CrossRef]
  78. Szatmári, G.; Pásztor, L. Comparison of various uncertainty modelling approaches based on geostatistics and machine learning algorithms. Geoderma 2019, 337, 1329–1340. [Google Scholar] [CrossRef]
  79. Kasraei, B.; Heung, B.; Saurette, D.D.; Schmidt, M.G.; Bulmer, C.E.; Bethel, W. Quantile regression as a generic approach for estimating uncertainty of digital soil maps produced from machine-learning. Environ. Model. Softw. 2021, 144, 105139. [Google Scholar] [CrossRef]
  80. Grimm, R.; Behrens, T. Uncertainty analysis of sample locations within digital soil mapping approaches. Geoderma 2010, 155, 154–163. [Google Scholar] [CrossRef]
  81. Wu, J. Hierarchy and Scaling: Extrapolating Information along a Scaling Ladder. Can. J. Remote Sens. 1999, 25, 367–380. [Google Scholar] [CrossRef] [Green Version]
  82. Volk, M.; Möller, M.; Wurbs, D. A pragmatic approach for soil erosion risk assessment within policy hierarchies. Land Use Policy 2010, 27, 997–1009. [Google Scholar] [CrossRef]
  83. Behrens, T.; Zhu, A.X.; Schmidt, K.; Scholten, T. Multi-scale digital terrain analysis and feature selection for digital soil mapping. Geoderma 2010, 155, 175–185. [Google Scholar] [CrossRef]
  84. Behrens, T.; Viscarra Rossel, R.A. On the interpretability of predictors in spatial data science: The information horizon. Sci. Rep. 2020, 10, 16737. [Google Scholar] [CrossRef] [PubMed]
  85. Verdonck, T.; Baesens, B.; Óskarsdóttir, M.; vanden Broucke, S. Special issue on feature engineering editorial. Mach. Learn. 2021. [Google Scholar] [CrossRef]
  86. Behrens, T.; Viscarra Rossel, R.A.; Kerry, R.; MacMillan, R.; Schmidt, K.; Lee, J.; Scholten, T.; Zhu, A.X. The relevant range of scales for multi-scale contextual spatial modelling. Sci. Rep. 2019, 9, 14800. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  87. Kühnel, A.; Wiesmeier, M.; Kögel-Knabner, I.; Spörlein, P. Veränderungen der Humusqualität und -Quantität bayerischer Böden im Klimawandel; Technical Report; Bayerisches Landesamt für Umwelt: Hof, Germany, 2020. [Google Scholar]
  88. Sharma, M.; Kaushal, R.; Kaushik, P.; Ramakrishna, S. Carbon farming: Prospects and challenges. Sustainability 2021, 13, 11122. [Google Scholar] [CrossRef]
  89. Lacroix, P.; Bièvre, G.; Pathier, E.; Kniess, U.; Jongmans, D. Use of Sentinel-2 images for the detection of precursory motions before landslide failures. Remote Sens. Environ. 2018, 215, 507–516. [Google Scholar] [CrossRef]
  90. Ienco, D.; Interdonato, R.; Gaetano, R.; Ho Tong Minh, D. Combining Sentinel-1 and Sentinel-2 Satellite image time series for land cover mapping via a multi-source deep learning architecture. ISPRS J. Photogramm. Remote Sens. 2019, 158, 11–22. [Google Scholar] [CrossRef]
  91. Claverie, M.; Ju, J.; Masek, J.G.; Dungan, J.L.; Vermote, E.F.; Roger, J.C.; Skakun, S.V.; Justice, C. The Harmonized Landsat and Sentinel-2 surface reflectance data set. Remote Sens. Environ. 2018, 219, 145–161. [Google Scholar] [CrossRef]
Figure 1. Workflow for the scale-specific SOC content prediction based on SCMaP Soil Reflectance Composites (SCMaP-SRC) and terrain attributes (RU: reference units; RFE: recursive feature elimination).
Figure 1. Workflow for the scale-specific SOC content prediction based on SCMaP Soil Reflectance Composites (SCMaP-SRC) and terrain attributes (RU: reference units; RFE: recursive feature elimination).
Remotesensing 14 02295 g001
Figure 2. Test site location in Bavaria and the distribution of soil samples. Projection: EPSG 31468 (https://spatialreference.org/ref/epsg/31468; accessed on 13 March 2022).
Figure 2. Test site location in Bavaria and the distribution of soil samples. Projection: EPSG 31468 (https://spatialreference.org/ref/epsg/31468; accessed on 13 March 2022).
Remotesensing 14 02295 g002
Figure 3. Comparison of soil samples’ SOC content [%] distributions: density plot with quantile values Q (left) and plot of empirical cumulative distribution functions (ECDF) with ECDF distance D (right) for entire Bavaria (black) and the subset (red; cf., Figure 2).
Figure 3. Comparison of soil samples’ SOC content [%] distributions: density plot with quantile values Q (left) and plot of empirical cumulative distribution functions (ECDF) with ECDF distance D (right) for entire Bavaria (black) and the subset (red; cf., Figure 2).
Remotesensing 14 02295 g003
Figure 4. Visualization of multi-scale (a,b) and one-dimensional terrain attributes (c); cf., Table 1) as well as of selected SRC bands (d). Projection: EPSG 31468 (https://spatialreference.org/ref/epsg/31468; accessed on 13 March 2022).
Figure 4. Visualization of multi-scale (a,b) and one-dimensional terrain attributes (c); cf., Table 1) as well as of selected SRC bands (d). Projection: EPSG 31468 (https://spatialreference.org/ref/epsg/31468; accessed on 13 March 2022).
Remotesensing 14 02295 g004
Figure 5. Visualization of reference unit-specific scale levels on the example of a test site subset (a) as well as relation between logarithmized scale level L and object number O N ((b); cf., Table 3). The colored dashed lines refer to the scale levels L = 0.3 , L = 1 and L = 10 . (a) White areas represent land that is not used by agriculture.
Figure 5. Visualization of reference unit-specific scale levels on the example of a test site subset (a) as well as relation between logarithmized scale level L and object number O N ((b); cf., Table 3). The colored dashed lines refer to the scale levels L = 0.3 , L = 1 and L = 10 . (a) White areas represent land that is not used by agriculture.
Remotesensing 14 02295 g005
Figure 6. Comparison of independent validation results based on SOC content (%) data sets for entire Bavaria (a) and the subset ((b), cf., Table 2). The red line is the regression line of the scatter plot, the gray and dashed line indicates the optimal regression line.
Figure 6. Comparison of independent validation results based on SOC content (%) data sets for entire Bavaria (a) and the subset ((b), cf., Table 2). The red line is the regression line of the scatter plot, the gray and dashed line indicates the optimal regression line.
Remotesensing 14 02295 g006
Figure 7. Relations between logarithmized scale level L and accuarcy metrics R V A L 2 (a), R M S E V A L (b), and S L O P E V A L (c) (cf., Table 3) for the parametrization variants terrain attributes (TA), SCMaP-SRC (SRC), and terrain attributes + SCMaP-SRC (TA + SRC). The gray dashed line refers to scale levels L = 0.6 and L = 6 (cf., Figure 8 and Figure 9).
Figure 7. Relations between logarithmized scale level L and accuarcy metrics R V A L 2 (a), R M S E V A L (b), and S L O P E V A L (c) (cf., Table 3) for the parametrization variants terrain attributes (TA), SCMaP-SRC (SRC), and terrain attributes + SCMaP-SRC (TA + SRC). The gray dashed line refers to scale levels L = 0.6 and L = 6 (cf., Figure 8 and Figure 9).
Remotesensing 14 02295 g007
Figure 8. Comparison of SOC content (%) validation results based on test data sets for six prediction variants (cf., Table 3). The red line is the regression line of the scatter plot, the gray and dashed line indicates the optimal regression line.
Figure 8. Comparison of SOC content (%) validation results based on test data sets for six prediction variants (cf., Table 3). The red line is the regression line of the scatter plot, the gray and dashed line indicates the optimal regression line.
Remotesensing 14 02295 g008
Figure 9. Predicted SOC content (%) values for scale level L = 0.6 and parametrization variant terrain attributes + SCMaP-SRC (TA+SRC) (left) as well as scale level L = 6 and parametrization variant terrain attributes (TA) (cf., Table 3 and Figure 8a,f). Projection: EPSG 31468.
Figure 9. Predicted SOC content (%) values for scale level L = 0.6 and parametrization variant terrain attributes + SCMaP-SRC (TA+SRC) (left) as well as scale level L = 6 and parametrization variant terrain attributes (TA) (cf., Table 3 and Figure 8a,f). Projection: EPSG 31468.
Remotesensing 14 02295 g009
Figure 10. Comparison of soil samples’ (red) and predicted SOC content (%) distributions (black/blue): density plots with quantile values Q (left) and plots of empirical cumulative distribution functions (ECDF) with Kolmogorov–Smirnov (KS) distances D (right) for two prediction variants.
Figure 10. Comparison of soil samples’ (red) and predicted SOC content (%) distributions (black/blue): density plots with quantile values Q (left) and plots of empirical cumulative distribution functions (ECDF) with Kolmogorov–Smirnov (KS) distances D (right) for two prediction variants.
Remotesensing 14 02295 g010
Figure 11. RFE-based dependencies between R M S E values and parameter combinations for scale level L = 0.6 and parametrization variant terrain attributes + SCMaP-SRC (TA+SRC) (a) as well as scale level L = 6 and parametrization variant terrain attributes (TA) ((b); Table 4). The dashed blue lines indicate minimal R M S E values.
Figure 11. RFE-based dependencies between R M S E values and parameter combinations for scale level L = 0.6 and parametrization variant terrain attributes + SCMaP-SRC (TA+SRC) (a) as well as scale level L = 6 and parametrization variant terrain attributes (TA) ((b); Table 4). The dashed blue lines indicate minimal R M S E values.
Remotesensing 14 02295 g011
Table 1. Explanatory variables for the SOC content prediction: terrain attribute variants and SCMaP-SCR bands ( S M 1 : definitions of terrain attribute variants and tuning parameters see function fNumP() and R function collection (https://github.com/FLFgit/ScaleP/blob/master/callScaleP.R; accessed on 13 March 2022).
Table 1. Explanatory variables for the SOC content prediction: terrain attribute variants and SCMaP-SCR bands ( S M 1 : definitions of terrain attribute variants and tuning parameters see function fNumP() and R function collection (https://github.com/FLFgit/ScaleP/blob/master/callScaleP.R; accessed on 13 March 2022).
Explanatory VariableMeaningMulti-Scale Tuning Parameter (Start and End Value)Variant NumberSource
F I L L Digital Elevation Model with filled sinks1[46]
S L P Slope1[47]
V D C Vertical Distance above Channel NetworkCatchment Area
(CA ∈ [10,000:1000,000])
10[41]
T C I Terrain Classification IndexCatchment Area
(CA ∈ [10,000:1000,000])
10[41]
T W I Topographic Wetness Index1[44]
M B I Mass Balance IndexCurvature Transfer
Constant ( T 0.0001:0.1)
10[21]
T O P Topographic (positive) Openness1[43]
T O N Topographic (negative) Openness1[43]
N H Normalized HeightGeneralization Parameter
( t [2:1000])
10[41]
T P I Topographic Position IndexScale Parameter
( S [20:1000])
10[42]
S R C 1 7 SCMaP-SRC (1984–2014), Landsat Reflectances7[48]
S R C 8 14 SCMaP-SRC (1984–2014), normalized Landsat Reflectances7[48]
Table 2. Accuracy metrics for the SOC content (%) prediction based on SCMaP-SRC parametrization applied by Zepp et al. (2021) [5] for Bavaria and the test site subset.
Table 2. Accuracy metrics for the SOC content (%) prediction based on SCMaP-SRC parametrization applied by Zepp et al. (2021) [5] for Bavaria and the test site subset.
VariantSample Number RMSE CV R CV 2 RMSE CAL R CAL 2 RMSE VAL R VAL 2 SLOPE VAL
Bavaria9391.290.620.540.941.320.650.58
Subset2202.300.601.000.922.110.740.74
Table 3. Scale-specific accuracy metrics for SOC content (%) prediction variants based on terrain attributes (subscript A; cf. Table 1), SCMaP-SRC (subscript B) and both data sets (subscript C). R M S E —root mean square error; R 2 —coefficient of determination; subscript CV—cross validation; subscript CAL—calibration; subscript VAL—independent validation. The gray and bold emphasized values refer to the prediction variant with highest accuracy metrics (Figure 9 and the corresponding scatter plots (Figure 8). The gray, bold and red values emphasized is related to the best prediction variant (cf., Figure 8f).
Table 3. Scale-specific accuracy metrics for SOC content (%) prediction variants based on terrain attributes (subscript A; cf. Table 1), SCMaP-SRC (subscript B) and both data sets (subscript C). R M S E —root mean square error; R 2 —coefficient of determination; subscript CV—cross validation; subscript CAL—calibration; subscript VAL—independent validation. The gray and bold emphasized values refer to the prediction variant with highest accuracy metrics (Figure 9 and the corresponding scatter plots (Figure 8). The gray, bold and red values emphasized is related to the best prediction variant (cf., Figure 8f).
Scale LevelObject Number RMSE A , CV R A , CV 2 RMSE A , CAL R A , CAL 2 RMSE A , VAL R A , VAL 2 SLOPE A , VAL RMSE B , CV R B , CV 2 RMSE B , CAL R B , CAL 2 RMSE B , VAL R B , VAL 2 SLOPE B , VAL RMSE C , CV R C , CV 2 RMSE C , CAL R C , CAL 2 RMSE C , VAL R C , VAL 2 SLOPE C , VAL
10181832.89310.43853.260.530.352.570.451.330.873.280.470.422.610.441.320.873.340.480.41
9543062.900.321.340.872.960.580.382.490.511.280.863.010.530.442.540.501.180.903.180.510.43
8550113.040.271.370.863.070.570.342.730.431.300.862.970.530.452.730.441.390.852.920.590.39
7561132.860.331.110.902.910.610.352.550.441.180.883.010.580.482.570.441.090.912.880.640.45
6577182.850.331.100.892.780.620.362.390.521.110.902.650.650.512.490.481.100.902.760.650.48
5603382.750.361.150.893.000.560.342.510.451.170.882.880.550.482.530.451.070.903.000.550.39
4650182.760.361.120.903.260.430.332.390.491.000.922.950.490.462.460.480.940.923.170.430.41
3745752.640.410.970.913.470.460.342.190.590.980.922.880.590.482.220.580.860.942.740.620.46
2986302.850.320.950.913.030.560.342.190.610.970.912.350.730.602.280.570.890.942.430.710.50
12033792.620.410.850.933.310.520.372.150.600.900.942.160.810.692.200.580.850.952.350.780.61
0.92354842.730.370.930.933.320.500.352.140.610.810.952.390.750.662.220.580.860.942.350.760.59
0.82800842.750.360.900.943.100.540.342.220.580.900.942.000.840.732.250.560.780.952.020.840.66
0.73453982.800.330.930.933.180.540.342.240.570.900.942.230.790.722.260.560.820.942.300.790.65
0.64509392.810.340.930.923.210.540.332.180.600.860.942.110.810.702.200.590.850.941.990.840.63
0.56317942.790.350.890.933.290.550.332.280.570.960.932.140.810.712.310.550.850.942.060.830.65
0.49659942.860.310.930.933.440.530.352.430.520.960.932.310.820.662.390.520.890.932.270.820.65
0.315155132.830.320.960.933.300.580.362.120.650.840.953.100.650.532.160.630.810.953.180.670.50
Table 4. The most important parameters based on RFE algorithm, which lead to minimal R M S E accuracy metrics. The order of the parameters represents their meaning. The gray marked table cells refer to Figure 11.
Table 4. The most important parameters based on RFE algorithm, which lead to minimal R M S E accuracy metrics. The order of the parameters represents their meaning. The gray marked table cells refer to Figure 11.
Scale LevelTASRCTA+SRC
10.0 F I L L , T O P , S L P , M B I 1 S R C 10 , S R C 12 , S R C 3 , S R C 1 , S R C 9 , S R C 4 , S R C 2 , S R C 11 , S R C 15 , S R C 7 S R C 10 , S R C 9 , S R C 12 , S R C 3 , S R C 2 , S R C 11 , S R C 4 , S R C 1 , N H 2 , S R C 15 , S R C 7
9.0 F I L L , T O P , V D C 599484 , M B I 1 S R C 10 , S R C 12 , S R C 2 , S R C 7 , S R C 3 , S R C 9 , S R C 11 , S R C 5 , S R C 4 , S R C 1 , S R C 13 , S R C 6 S R C 10 , S R C 2 , S R C 9 , S R C 3 , S R C 11 , S R C 12 , S R C 7 , S R C 15 , S R C 4
8.0 F I L L , N H 4 , T O P S R C 10 , S R C 12 , S R C 7 , S R C 3 , S R C 2 S R C 10 , S R C 3 , S R C 2 , S R C 9 , S R C 12
7.0 T O P , F I L L , T O N S R C 10 , S R C 3 , S R C 12 , S R C 4 S R C 3 , S R C 10 , S R C 4 , S R C 12 , S R C 2 , S R C 15 , V D C 46416 , S R C 1 , S R C 9 , T O N , T O P , S R C 5 , S R C 7 , V D C 359381 , S R C 11 , F I L L
6.0 F I L L , T O P , N H 4 , S L P , V D C 46416 S R C 12 , S R C 3 , S R C 10 S R C 3 , S R C 12 , S R C 2 , S R C 10 , S R C 15
5.0 F I L L , N H 4 , T O P , S L P , N H 8 S R C 12 , S R C 3 , S R C 10 S R C 3 , S R C 15 , S R C 4 , S R C 2 , S R C 10 , S R C 12 , F I L L , S R C 9 , S R C 13 , S R C 7 , S R C 1 , T O P , S R C 11 , S L P , N H 4
4.0 F I L L , T O P , N H 4 , N H 8 S R C 3 , S R C 12 , S R C 10 , S R C 15 , S R C 1 S R C 3 , S R C 2 , S R C 15 , S R C 12 , S R C 10 , S R C 4 , S R C 1 , S R C 9 , S R C 7 , S R C 5 , S R C 13 , S R C 11
3.0 F I L L , S L P S R C 12 , S R C 3 , S R C 10 , S R C 4 , S R C 2 , S R C 11 , S R C 15 , S R C 9 , S R C 1 S R C 3 , S R C 12 , S R C 10 , S R C 2 , S R C 4 , S R C 15 , S R C 9 , S R C 11 , S R C 1 , S R C 5 , S R C 13 , S L P , V D C 359381 , T O N , S R C 7 , V D C 16681
2.0 F I L L , N H 4 S R C 12 , S R C 3 , S R C 10 , S R C 2 , S R C 14 , S R C 4 , S R C 9 , S R C 1 , S R C 5 , S R C 13 S R C 3 , S R C 2 , S R C 12 , S R C 4 , S R C 5 , S R C 10 , S R C 1 , S R C 9 , S R C 15 , S R C 14 , S R C 7 , S R C 13 , F I L L , V D C 10000 , S L P , S R C 11 , V D C 599484 , V D C 359381
1.0 F I L L , N H 4 , S L P S R C 12 , S R C 10 , S R C 2 , S R C 3 , S R C 4 , S R C 9 , S R C 11 , S R C 14 , S R C 8 , S R C 1 , S R C 5 , S R C 13 , S R C 7 , S R C 6 S R C 12 , S R C 2 , S R C 3 , S R C 4 , S R C 10 , S R C 14 , S R C 5 , S R C 9 , S R C 1 , S R C 13 , S R C 11 , F I L L , S R C 15 , N H 4 , V D C 129155 , S L P
0.9 F I L L , N H 4 S R C 12 , S R C 2 , S R C 10 , S R C 3 , S R C 9 , S R C 4 , S R C 11 , S R C 1 , S R C 14 S R C 12 , S R C 2 , S R C 3 , S R C 10 , S R C 4 , S R C 9 , V D C 129155 , S R C 1 , S R C 14 , S R C 5
0.8 F I L L , N H 4 S R C 12 , S R C 10 , S R C 2 S R C 12 , S R C 2 , S R C 3 , S R C 10
0.7 F I L L , N H 4 S R C 12 , S R C 10 , S R C 2 S R C 12 , S R C 2 , S R C 3 , S R C 10 , S R C 4 , S R C 13
0.6 F I L L , N H 4 S R C 12 , S R C 2 , S R C 10 , S R C 3 , S R C 9 , S R C 4 , S R C 13 , S R C 8 , S R C 11 , S R C 5 S R C 12 , S R C 2 , S R C 3 , S R C 10 , S R C 4 , S R C 13 , S R C 9 , V D C 129155 , S R C 5 , N H 4 , S R C 1 , S R C 15 , F I L L
0.5 F I L L , N H 4 , T O N S R C 12 , S R C 2 , S R C 3 , S R C 10 S R C 2 , S R C 12 , S R C 3 , S R C 13 , S R C 10 , S R C 4 , F I L L , V D C 129155
0.4 F I L L , N H 4 S R C 12 , S R C 2 , S R C 3 , S R C 10 S R C 2 , S R C 3 , S R C 12
0.3 F I L L , N H 4 , T O N S R C 2 , S R C 12 , S R C 10 , S R C 3 , S R C 9 , S R C 13 , S R C 7 , S R C 1 , S R C 8 , S R C 4 S R C 2 , S R C 3 , S R C 13 , S R C 12 , V D C 359381 , S R C 4 , S R C 10 , T O N , N H 2 , S R C 1 , F I L L , N H 4 , S R C 7 , S R C 5 , V D C 10000 , S L P , S R C 11 , V D C 46416
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Möller, M.; Zepp, S.; Wiesmeier, M.; Gerighausen, H.; Heiden, U. Scale-Specific Prediction of Topsoil Organic Carbon Contents Using Terrain Attributes and SCMaP Soil Reflectance Composites. Remote Sens. 2022, 14, 2295. https://doi.org/10.3390/rs14102295

AMA Style

Möller M, Zepp S, Wiesmeier M, Gerighausen H, Heiden U. Scale-Specific Prediction of Topsoil Organic Carbon Contents Using Terrain Attributes and SCMaP Soil Reflectance Composites. Remote Sensing. 2022; 14(10):2295. https://doi.org/10.3390/rs14102295

Chicago/Turabian Style

Möller, Markus, Simone Zepp, Martin Wiesmeier, Heike Gerighausen, and Uta Heiden. 2022. "Scale-Specific Prediction of Topsoil Organic Carbon Contents Using Terrain Attributes and SCMaP Soil Reflectance Composites" Remote Sensing 14, no. 10: 2295. https://doi.org/10.3390/rs14102295

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop