4.1. AGD Importance to Predict Soil Attributes through DSM
From the observed results, the modeling with the AGD obtained better performance in terms of R
2, RMSE, and MAE for both prediction algorithms used for modeling the selected soil attributes. Ref. [
10] performed a similar study by applying different geophysical sensors combination (measured in situ) for modeling soil properties, and, in general, the modeling without using geophysical sensors also showed the poorest results. According to [
10], gamma-ray spectrometry and magnetic susceptibility were the best combinations of geophysical data. The comparison of the results of this study with those obtained by [
10] is represented in
Table 9, except for ASat.
The comparison between the studies shows that the results obtained in the present study were satisfactory. The most significant discrepancy is in the performance of the Clay models with lower R2 values than those observed by [
10]. For BS, the performance of this study was better for both models: 0.22 versus 0.17 for the RF model and 0.23 versus 0.11 for the SVM model. For CEC, the performance of this study was better for the RF model (0.23 versus 0.14). For OC, both models also presented a better performance, highlighting the difference in the values observed for the RF model with values 0.16 in this study in contrast with 0.05 presented in [
10].
Ref. [
9] also analyzed the different models to predict topsoil particle-size distribution, with and without gamma-ray spectrometry, to replace lithology maps. According to the authors, a significant increase in models’ performance was observed across all particle sizes when gamma-ray spectrometry was used instead of lithology, permitting the creation of more pedologically meaningful maps. Another example is presented by [
66]. Soil properties were measured in situ using several sensors to test the performance of the individual sensors and their combination to enhance soil property predictions (organic carbon, sum of bases, CEC, clay content, volumetric moisture, and bulk density). In this case, the X-ray fluorescence spectrometer sensor was superior. However, the gamma-ray sensor was the second best among individual sensors for predicting all those soil properties and the best for predicting CEC values.
The better performance of modeling with AGD was also confirmed through a comparison with NULL RMSE and NULL MAE values. In modeling with AGD, all NULL RMSE and NULL MAE values were higher for all soil attributes. Algorithms that exhibit RMSE and MAE values exceeding those of the NULL method are inferior and perform less than the use of mean value for the entire area [
10].
Ref. [
67] used the same database of the same area but with a different approach to predict soil properties. In their research, the modeling was performed only once for each attribute, without AGD, using the cross-validation method to evaluate the models’ performance. The results in terms of R
2 were 0.19 for Clay 0–5 cm using ordinary kriging, 0.19 for Clay 5–15 cm using ordinary kriging, and 0.18 for Clay 15–30 cm using regression kriging. For OC, the results were 0.06 for 0–5 cm using linear regression, 0.07 for 5–15 cm using linear regression, and 0.11 for 15–30 cm using regression tree.
Although the average accuracy of the model’s performance to predict Clay content has a lower R
2 value in this study (0.15 for RF and 0.11 for SVM in modeling with AGD), the number of times the models reached values R
2 ≥ 0.20 was 32 for RF and 12 for SVM in Clay modeling, as shown in
Table 7. For OC, the result presented by [
67] (R
2 = 0.11) is a lower value than observed for the RF model in this study (0.16). Additionally, the number of times the models reached R
2 ≥ 0.20 was 33 for RF and three for SVM in OC modeling.
In this case, another point that can be analyzed is that generalized harmonization at a depth of 0–30 in the present study seems not to have affected the performance of the models. Since the different depth-interval results (0–5, 5–15, and 15–30) shown by [
67] for Clay and OC contents did not show a significant performance increase, and considering that the results of the present study were obtained through 100 models and that the validation was made from a set of unknown samples, it can be said that the results are more reliable and showed better performance than those presented by [
67] to predict soil properties in the same study area.
4.2. Soil Properties and AGD Relationships
The results show that AGD combined with terrain and Sentinel-2 covariates played an essential role in predicting soil properties in the study area. AGDs were commonly observed as essential predictors for ASat, BS, CEC, Clay, and OC for both models (
Figure 10 and
Figure 11). However, some covariates stand out considering Spearman’s correlation analyses (
Figure 12). The ratios eTh/K and eU/K are examples, exhibiting positive correlations with ASat (0.37 and 0.29, respectively) and negative with BS (−0.34 and −0.22, respectively).
Interpreting the ratios between the elements’ concentrations helps characterize different lithotypes and highlights zones of radioelement enrichment and alteration [
17]. Assuming that K is more mobile and tends to be leached from the weathering profile in tropical and subtropical climates, while eTh and eU are generally retained in the weathering profile and associated with clays, oxides, and resistant minerals, it is possible to establish relationships between weathering and erosion rates [
15]. These relationships also agree with the observed results, where less-weathered soils have relatively higher values of BS (low values of these ratios). In contrast, more evolved soils are depleted in bases and are more acidic, and consequently have higher values of ASat (high values of these ratios). Therefore, the radioactive response largely depends on the evolutionary history of the landscape [
15].
The “Map Prediction and Uncertainty” analysis supports this hypothesis. The central region, which has high ASat values, mainly comprises Ferralsols and Acrisols. On the other hand, the highest BS values in the NW and SE regions correspond to the locations where Cambisol prevails. The eU correlations with ASat (0.21) and BS (−0.18) follow the same pattern. Ref. [
11] also demonstrated a negative correlation between BS and gamma-ray uranium. In
Appendix A,
Figure A1, attached to this paper, the airborne gamma-ray spectrometry maps of eTh/K and eU/K also support this idea. The areas with the highest values correspond to the high values observed in the average map for ASat (
Figure 6a), and the lower values correspond to the high values observed in the average map for BS (
Figure 6b), as well as the CV% maps demonstrating lower values and greater reliability in the data in these regions (
Figure 7a,b). This relationship is essential as it can be a good indicator of soil fertility since ASat and BS are used for this purpose [
68].
GY highlights anomalies perpendicular to its direction (in the x direction). It highlights superficial magnetic anomalies (
Figure A2 in
Appendix A), helping map geological contacts and shallow features, such as lineament distribution [
17]. Since lineament distribution influences groundwater flow [
69,
70], the tested hypothesis was to find a correlation between the drainage system and valley regions. In this sense, the GY correlated with Clay content (−0.13), where areas with low GY values present Clay deposits. According to [
71], clay values, among other soil properties, are significantly related to landscape position and tend to decrease downslope.
The negative correlation with GY corroborates with the hypothesis. However, the weak correlation value and the lack of significant correlation with other AGDs did not conform to what was expected. Furthermore, Clay content demonstrated an inverse relationship with GX (
Figure A3 in
Appendix A), suggesting that Clay values would change according to the directions of the applied filters, which does not make sense from a pedological point of view—demonstrating that these covariates may not be reliable for predicting soil properties. Despite the low performance observed in the present study, the use of AGD data to predict Clay contents showed satisfactory results, as reported by [
9,
10,
11,
18].
Factor F (or F-parameter) is a valuable tool for maximizing and distinguishing areas characterized by potassium enrichment resulting from hydrothermal alterations [
58]. This covariate showed a positive correlation with CEC (0.19), suggesting that high potassium values are related to high CEC values, confirmed by the positive correlations with Kperc (0.32) and negative with eU/K (−0.16). These results agree with [
11], showing a positive correlation between CEC and gamma-ray potassium (0.42). However, the CV% maps do not support the idea. Although the CV% maps for the SVM model demonstrate good reliability for the areas corresponding to the highest CEC values (
Figure 9c and
Figure 8c, respectively), the RF model does not follow the same pattern (
Figure 7c and
Figure 6c, respectively), demonstrating duality in the results and, consequently, low reliability.
In
Appendix A,
Figure A1, the Kperc and Factor F maps mainly match with higher elevation areas, as shown in
Figure 1. The areas of occurrence of Conselheiro Paulino and Sana granites, for example, highlight high potassium values as they represent undeformed granites resistant to weathering [
29,
32]. So, in this case, the high potassium values are related to the parent material, not the soil potassium content, explaining why BS did not correlate well with Kperc and factor F. Furthermore, knowing that CEC is the sum of bases plus H+ and Al+ [
68], it makes sense that CEC is related to potassium concentrations, but since it is also associated with aluminum, it may have confused the models.
The Kperc covariate showed a positive correlation with OC (0.32) as well factor F (0.25). According to a study conducted by [
72], it was discovered that radiometric potassium plays a crucial role in predicting soil carbon in Northern Ireland. Nevertheless, the relationship is inversed in [
72]’s case, with a correlation value of −0.51. Soils with high organic carbon content significantly diminish gamma rays’ intensity, as confirmed by the negative correlation with radiometric thorium (−0.36) and radiometric uranium (−0.29).
According to [
73], soil carbon has a high spatial variation, mainly where the land cover was altered for different purposes in tropical areas. To obtain an accurate picture of the carbon content in tropical regions, gathering data from a wide range of locations to account for this variation is essential. Ref. [
11] found the same pattern observed in the present study, showing a positive correlation between OC and radiometric potassium (0.17) in a study area in southeastern Brazil and climate classified as Cwa. The difference between the results can be explained mainly by the climatic conditions. In contrast, in the case of subtropical and tropical climates, the positive correlation values between OC and K could be related to topsoil erosion.
However, as observed for CEC, the CV% maps for OC demonstrated duality in the data. Although the CV% maps for the SVM model show good reliability for the areas corresponding to the highest OC values (
Figure 9e and
Figure 8e, respectively), the RF model did not follow the same pattern (
Figure 7e and
Figure 6e, respectively). So, further studies are needed to understand the relationship between the distribution of radioelements and organic carbon contents.
According to [
10,
11,
18], gamma-ray and magnetic susceptibility can be associated with soil attributes. However, there is still a gap in understanding the optimal covariates and their potential combinations to investigate further soil weathering, pedogenesis, and their relationship with soil attributes, especially when using airborne geophysical data, where appropriate scale can be an issue.
4.3. Precautions and Challenges
Based on the discussed results, some considerations should be made. First, data availability and quality are essential for more reliable modeling. Regrettably, the current soil databases lack the necessary comprehensiveness and precision to support the utilization of soil information [
1]. Digital soil modeling relies on accurate and comprehensive soil data. However, available soil data are generally insufficient or of varying quality, making it challenging to build reliable models based on a representative input soil dataset. Improving data collection methods, standardizing data formats, and enhancing data-sharing practices are crucial for better results [
5]. For example, in this study, although the analysis of the models and some possible relationships with soil properties suggest that AGD can be a helpful tool for soil mapping, more reliable results could be achieved if the sample density were representative. Of the 208 samples, 97 are Ferralsols, 35 are Cambisols, and 62 Acrisols, lacking representation for classes such as Leptosols (3), Fluvisols (5), Gleysols (5), and one Nitisol which were also reported in the study area.
The number of samples also affects the model’s validation and evaluation. Although the proportion for training (75%) and validation (25%) is suitable for DSM, only 156 samples were used for training and 52 for validation. Appropriate-scale sampling and sampling design are other issues.
Appendix A shows the correlation table of the studied properties and all the proposed covariates (
Figure A3). The observed correlations were well below expectations, mainly for terrain covariates widely used in soil mapping [
5]. This problem may be related to the density of samples collected, as shown in the SE of the map (
Figure 1). According to [
74], the statistical parameters are sensitive to the number and the locations of the soil observations. In this sense, sample distribution according to scale and number of samples is essential to ensure the accuracy and reliability of the models.
Spatial and temporal variability are also a challenge since soils exhibit considerable variability, and most DSM studies typically concentrate on predicting soil properties for a specific period [
5]. Topography, land use, climate, and geological processes are examples of factors that influence soil properties, and capturing this variability accurately in digital soil models is challenging. Bom Jardim County is a good example of an area that undergoes variations mainly through anthropic interferences associated with agricultural activities and the dominant mountainous relief [
25], which may explain the low significance of the Sentinel 2 MSI-based indices since the soil collection was carried out between the years 2009 and 2011, and the sentinel images were from 2021.
DSM often faces several challenges since soil observations are scarce and costly. However, achieving satisfactory results with AGD is still possible, as reported by [
7,
8,
9,
75,
76,
77]. After extensive research over the past years, DSM has made significant progress in producing soil maps, a credible alternative to fulfill the increasing worldwide demand for spatial soil information [
2].