Next Article in Journal
A Method Based on Improved iForest for Trunk Extraction and Denoising of Individual Street Trees
Next Article in Special Issue
Flood Runoff Simulation under Changing Environment, Based on Multiple Satellite Data in the Jinghe River Basin of the Loess Plateau, China
Previous Article in Journal
Effects of the Gully Land Consolidation Project on Geohazards on a Typical Watershed on the Loess Plateau of China
Previous Article in Special Issue
Remote Sensing-Supported Flood Forecasting of Urbanized Watersheds—A Case Study in Southern China
 
 
Article
Peer-Review Record

Improved Surface Soil Organic Carbon Mapping of SoilGrids250m Using Sentinel-2 Spectral Images in the Qinghai–Tibetan Plateau

Remote Sens. 2023, 15(1), 114; https://doi.org/10.3390/rs15010114
by Jiayi Yang 1,2, Junjian Fan 2, Zefan Lan 1,2, Xingmin Mu 1,2, Yiping Wu 3, Zhongbao Xin 4, Puqiong Miping 5 and Guangju Zhao 1,2,6,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Remote Sens. 2023, 15(1), 114; https://doi.org/10.3390/rs15010114
Submission received: 16 November 2022 / Revised: 11 December 2022 / Accepted: 22 December 2022 / Published: 25 December 2022
(This article belongs to the Special Issue Remote Sensing in Natural Resource and Water Environment)

Round 1

Reviewer 1 Report (Previous Reviewer 1)

Table 3 is missing in the revised version.

I have no additional comments.

Author Response

Thank you for pointing this out. We have fixed the error.

Reviewer 2 Report (Previous Reviewer 2)

This manuscript of Yang et al. predicted the SOC spatial distribution of QTP by calibrating Soil Information Grids Products (250 m) with intense field measurements by several machine learning models. High resolution satellite images including Sentinel-2 data were also applied in the modelling. Generally, this study is based on good design and intense data analysis and model simulation, and the results provide a case study of digital soil mapping with Sentinel-2 data over complex landscapes. The structure of the manuscript is well designed and the logic is easy to follow. The discussion is detailed and suggestive. I believe that this research is potentially a good contribution for carbon neutrality in the QTP and a reference for the digital mapping of SOC in complex terrain. However, some minor corrections should be further considered to improve the clarity of the manuscript. Therefore, I suggest a minor revision before accepting for publication in Remote Sensing.

 

Line 35: field measured? May be it should be filed measurements.

 

Line 39-41: It should be clarified here that R2 of 0.82 is from the training datasets, and R2 for validation datasets may be more suitable here.

 

Line 49: in the alpine cold region should be in alpine regions.

 

Line 152: SOC data from Liu et al. (2022) actually have higher accuracy than SoilGrids 250m for China, so why did not you use their products as environmental covariates instead of SoilGrids 250m.

 

Line 413: It may be better to compare your result with more current SOC products, including WISE30sec, GSDE, and GSOCmap. An example can be found below.

 

Lin, Z., Dai, Y., Mishra, U., Wang, G., Shangguan, W., Zhang, W., and Qin, Z.: On the magnitude and uncertainties of global and regional soil organic carbon: A comparative analysis using multiple estimates, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2022-232, in review, 2022.

 

Line 399: Liu et al. (2022) provided 90 m resolution soil products of China, so it is better to compare your product with their SOC products.

 

Line 418: It is better to add an additional paragraph in the discussion section, and give limitations and directions for future work.

Author Response

Reviewer #2: This manuscript of Yang et al. predicted the SOC spatial distribution of QTP by calibrating Soil Information Grids Products (250 m) with intense field measurements by several machine learning models. High resolution satellite images including Sentinel-2 data were also applied in the modelling. Generally, this study is based on good design and intense data analysis and model simulation, and the results provide a case study of digital soil mapping with Sentinel-2 data over complex landscapes. The structure of the manuscript is well designed and the logic is easy to follow. The discussion is detailed and suggestive. I believe that this research is potentially a good contribution for carbon neutrality in the QTP and a reference for the digital mapping of SOC in complex terrain. However, some minor corrections should be further considered to improve the clarity of the manuscript. Therefore, I suggest a minor revision before accepting for publication in Remote Sensing.

 

Response to the comments:

Line 35: field measured? May be it should be filed measurements.

This has been revised in the manuscript.

Line 39-41: It should be clarified here that R2 of 0.82 is from the training datasets, and R2 for validation datasets may be more suitable here.

Thanks for your nice suggestions. We have revised it.

Line 49: in the alpine cold region should be in alpine regions.

We have revised the manuscript.

Line 152: SOC data from Liu et al. (2022) actually have higher accuracy than SoilGrids 250m for China, so why did not you use their products as environmental covariates instead of SoilGrids 250m.

The SoilGrids 250m accuracy is the result of 90m accuracy resampling, the result is not much different, we obtained 250m in the process of requesting data.

Line 413: It may be better to compare your result with more current SOC products, including WISE30sec, GSDE, and GSOCmap. An example can be found below.

Lin, Z., Dai, Y., Mishra, U., Wang, G., Shangguan, W., Zhang, W., and Qin, Z.: On the magnitude and uncertainties of global and regional soil organic carbon: A comparative analysis using multiple estimates, Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2022-232, in review, 2022.

Thanks for your nice suggestions. The HWSD was produced by linking the regional and national soil properties information to the soil map of the world according to taxonomy-based pedotransfer functions. Li, et al. developed SOC maps in the topsoil (0–­20 cm) and the first meter soil at 1km by 1km spatial resolution over China for the periods of 1980s, 2000s, and 2010s based on a machine learning approach and extensive measurements. We compared the results of HWSD and Li’s prediction, the HWSD is one of the most coherent and widely used global soil databases and Li’s prediction is also more cutting-edge. Therefore, these two databases were used as a reference for comparative analysis. By comparing the results of existing SOC-based observations, we can quantify our SOC distribution prediction results.

Line 399: Liu et al. (2022) provided 90 m resolution soil products of China, so it is better to compare your product with their SOC products.

Thanks for your nice suggestions. It is true that comparing the 90m resolution soil products would have made the comparison meaningful, but unfortunately there was a problem with the 90m data during the application process and we did not available. But as the answer to the previous comment, we compared the HWSD and Li’s prediction, which is also very valuable.

Line 418: It is better to add an additional paragraph in the discussion section, and give limitations and directions for future work.

We have added paragraph in the discussion section, and presented limitations and directions for future work in line 409.

 

Author Response File: Author Response.docx

Reviewer 3 Report (Previous Reviewer 3)

I'm reviewing "Improved Surface Soil Organic Carbon Mapping of SoilGrids250m Using Sentinel-2 Spectral Images in the Qinghai-Tibetan Plateau".

The proposed article is well written and clearly presented.

The used materials are adeguately exposed and the applied procedures clearly presented.

 

 

The most relevant variable in prediction model (i.e. SOC_250) comes from Liu et al, 2022. Liu's article explore the link between environment and soil properties and relies on landsat (both 5 and 8) and modis and further climate and topography inputs.

SOC pattern predicted by Liu et al. seems to differ from the proposed ones; this match with the different model applied.

It would be of interest for the reader to explore the source of such differences.

 

You could include soilGrids250m in table 4 as an additional model (it' meaningfull to compare it both against calibration and validation dataset, since it should return the same performance). This would clearly point out the improvement offered by the proposed method.

 

 

Author Response

Reviewer #3: I'm reviewing "Improved Surface Soil Organic Carbon Mapping of SoilGrids250m Using Sentinel-2 Spectral Images in the Qinghai-Tibetan Plateau".

The proposed article is well written and clearly presented.

The used materials are adeguately exposed and the applied procedures clearly presented.

The most relevant variable in prediction model (i.e. SOC_250) comes from Liu et al, 2022. Liu's article explore the link between environment and soil properties and relies on landsat (both 5 and 8) and modis and further climate and topography inputs.

SOC pattern predicted by Liu et al. seems to differ from the proposed ones; this match with the different model applied.

It would be of interest for the reader to explore the source of such differences.

You could include soilGrids250m in table 4 as an additional model (it' meaningfull to compare it both against calibration and validation dataset, since it should return the same performance). This would clearly point out the improvement offered by the proposed method.

Response to the comments:

Thanks for your valuable suggestions. We have added the result of predicted SOC values and SOC_250m in table 4 to compare them for both calibration and validation dataset. In addition, the following analysis related to SOC_250m has been added to 4.1 and 4.2 respectively.

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report (Previous Reviewer 3)

The revised version of the submitted manuscript "Improved Surface Soil Organic Carbon Mapping of SoilGrids250m Using Sentinel-2 Spectral Images in the Qinghai-Tibetan Plateau" has some improvement in the text exposition. 

Some likely internal writing notes are left in the submitted versione e.g. "???" at line 361 and "??????" at line 268; you should read trought the text to solve such occurrence. 

 

Comparison of RF and original data is exposd in lines from 268 to 270. 

I would consider, without details about the way you performed the comparison (e.g. did you upscale/average RF outputs to match the SoilGrids250m cell size?), RF to be not stronger in prediction ability: RF's RMSE shown in table 4 is higher for the validation dataset. 

On that consideration I would say that proposed RF model has the main strength of providing a viable downscaling option from 250m to 20m.

 

Author Response

Response to the comments:

The "???" at line 361 and "??????" at line 268 you raised on my side shows an empty line. And I think I misunderstood your review comments in the first round, I used the predicted value of RF to compare with the data of SOC250m, so the difference in the results was not significant, even in table4 SOC250m RMSE was lower than RF, and in the comparative analysis, I combined the data of RMSE and R2 and RPD and came to the conclusion that RF has higher prediction accuracy. So you had doubts about the result in the second round of review, and we reanalyzed and compared our measured values with SOC250m and recorrected the results, and I think it is now more convincing to conclude that our prediction accuracy are higher than SOC250m.

Author Response File: Author Response.docx

Round 3

Reviewer 3 Report (Previous Reviewer 3)

The current revision of the proposed paper shows an effective performance boost in predicting SOC as it is stated in the conclusions.
The corrected row in table 4, as the author states, rises the significance of the experiment presented.

I consider the proposed version of the manuscript suitable for publication.

 

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Manuscript titled as „Retrieval and Mapping of Surface Soil Organic Carbon Using Sentinel-2 Spectral Images in the Qinghai-Tibetan Plateau“ represents a valuable contribution to investigations of global carbon cycle and soil quality, especially to surface SOC evaluation through modeling and mapping. There is a need to obtain enough accurate SOC information in even less accessible, complex and data-poor regions, through development of efficient methods. In this study, authors aimed to use remote sensing data as well existing soil data and field measurements to map SOC in Qinghai-Tibetan Plateau. They applied different machine learning methods to predict and map SOC in the whole area.

Research objective was to use four machine learning methods, partial least squares regression (PLSR), support vector machines (SVM), random forest (RF) and artificial neural network (ANN) to build spatial prediction models based on 396 soil samples and various combinations of covariates.

The investigation is novel, pragmatically planned and set up to find efficient solution for SOC mapping in similar regions, topic is currently very interesting and important in terms of climate change and building inventory of global soil carbon stocks.

Introduction paragraph is understandable, progressively give insight to the subject supported by relevant literature review. Methods are well described, and divided to the proper subsections. Suggestion is that description of models in section 2.5 includes also explanation of datasets used for each of four models development (reference data, predictor variables as Sentinel2 data and other covariates). Results are clear, but need some improvement in representation of model results using different covariates. Suggestion is to add some data (down in specific comments). References are adequate and relevant to this study.

Line 24: “predicated” to” based”

Line26: Isn`t “relative percent difference” something else? Should it be “relative prediction deviation” according to formula? In M&M put reference for RPD classification or categorization.

Line 27-28: put abbrev. for bands and VIs

Line 30: more clearly. Comparison based on which statistical parameters?

Line 65-67: provide main results

Line 92: “effective” to, for example, “possible solution”…

Line 95: aim 1) …and spectral and environmental covariates (suggest to add); regarding aim 2, please, mention in M&M interpolation statistics, especially in comparison of maps

Line 109-110: southeast is mentioned twice. Please correct

Section 2.2.: define sampling scheme. What factors were used for selecting sampling scheme?

Line 122: What statistical parameters were used for comparison with other soil maps?

Line 129-133: explain what represents the sampling station? Is it five nearby stations for each of 396 samples? It is not clear. Explain sampling scheme you used. Which criteria you used for choosing this specific sampling plan? Stratified sampling?

Line 162: change B8A to B8

Line 163: this statement needs reference

Sections 2.5.1. to 2.5.4.: in each of the described machine learning models, please explain datasets you used, or how did you build models with selected covariates, what were the inputs.

Line 220: abbrev. for VIP

Line 224: as mentioned in general comments, did you mean “residual prediction deviation” for RPD, used to find prediction error with variation in the data? Put reference for RPD categorization.

Line 246: kurtosis was high…did you mean “lower”

Line 252: put reference

Line 257: please explain “strong generalization and consistency”

Table 4: what independent data (different environmental covariates) did you use for models represented in this table (SOC-250, bands or VIs, or all together)?

Line 265 and Figure 4, line 343: change to variable importance

Line 366 to 368: modify the sentence as “For instance, Gomez, et al. [71] reported that hyperspectral proximal and remote sensing had the potential to predict SOC using visible and near infrared reflectance.”

Line 385: put HWSD in reference list

 

Author Response

Response to Reviewer 1 Comments

Comment: Manuscript titled as „Retrieval and Mapping of Surface Soil Organic Carbon Using Sentinel-2 Spectral Images in the Qinghai-Tibetan Plateau“ represents a valuable contribution to investigations of global carbon cycle and soil quality, especially to surface SOC evaluation through modeling and mapping. There is a need to obtain enough accurate SOC information in even less accessible, complex and data-poor regions, through development of efficient methods. In this study, authors aimed to use remote sensing data as well existing soil data and field measurements to map SOC in Qinghai-Tibetan Plateau. They applied different machine learning methods to predict and map SOC in the whole area.

Research objective was to use four machine learning methods, partial least squares regression (PLSR), support vector machines (SVM), random forest (RF) and artificial neural network (ANN) to build spatial prediction models based on 396 soil samples and various combinations of covariates.

The investigation is novel, pragmatically planned and set up to find efficient solution for SOC mapping in similar regions, topic is currently very interesting and important in terms of climate change and building inventory of global soil carbon stocks.

Introduction paragraph is understandable, progressively give insight to the subject supported by relevant literature review. Methods are well described, and divided to the proper subsections. Suggestion is that description of models in section 2.5 includes also explanation of datasets used for each of four models development (reference data, predictor variables as Sentinel2 data and other covariates). Results are clear, but need some improvement in representation of model results using different covariates. Suggestion is to add some data (down in specific comments). References are adequate and relevant to this study.

Response: Thank you for your valuable comments. We have revised the manuscript as per your valuable comments.

Point 1: Line 24: “predicated” to” based”

Response 1: Thanks for your nice suggestions. We have revised it.

Point 2: Line26: Isn`t “relative percent difference” something else? Should it be “relative prediction deviation” according to formula? In M&M put reference for RPD classification or categorization.

Response 2: Thank you for the comment. RPD should be the “ratio of performance to deviation”, we have revised the manuscript and added a detailed introduction to the RPD classification in section 2.5.5 and put a reference for RPD.

Point 3: Line 27-28: put abbrev. for bands and VIs

Response 3: The abbreviations have been added for each factor in the section.

Point 4: Line 30: more clearly. Comparison based on which statistical parameters?

Response 4: We compared them based on spatial resolution and map detail.

Point 5: Line 65-67: provide main results

Response 5: We have changed this sentence to “Meng, et al. [13] found hyperspectral sensors can better utilize the spectral properties corresponding to SOC content for organic carbon prediction due to a large number of SOC sensitive bands and narrower bandwidth.”. Please see the revised manuscript.

Point 6: Line 92: “effective” to, for example, “possible solution”.

Response 6: The effective has been changed into “possible solution”.

Point 7: Line 95: aim 1) …and spectral and environmental covariates (suggest to add); regarding aim 2, please, mention in M&M interpolation statistics, especially in comparison of maps

Response 7: Thanks for your nice suggestions. We have added the spectral and environmental covariates. As for aim 2, we mentioned it in comparison of maps.

Point 8: Line 109-110: southeast is mentioned twice. Please correct

Response 8: Thank you for the detailed review. This “The shrubs are mainly distributed in the southeast, whereas coniferous and broadleaf forests occur mainly in the south and southeast of the QTP.” is correct, “large bare land exists in the northwest” which we have added in section 2.1 and added the relevant reference. Please see the revised manuscript.

Point 9: Section 2.2.: define sampling scheme. What factors were used for selecting sampling scheme?

Response 9: The field survey attempted to cover as more soil sampling points as possible since the QTP covers large area. Furthermore, we considered different vegetation types, land use, topography and soil types in the whole QTP, and make sure the sampling points represent various soil types. And we have added this in section 2.2, please see the revised manuscript.

Point 10: Line 122: What statistical parameters were used for comparison with other soil maps?

Response 10: The statistical parameters are spatial resolution and map details. We have clarified it in section 4.3. Please see the revised manuscript.

Point 11: Line 129-133: explain what represents the sampling station? Is it five nearby stations for each of 396 samples? It is not clear. Explain sampling scheme you used. Which criteria you used for choosing this specific sampling plan? Stratified sampling?

Response 11: We here revised the Field measured data by adding a comprehensive explanation for the sampling scheme. A total of 396 surface sampling points were obtained in the QTP from 2019 to 2020 (Figure 1). Three 1 m×1 m sampling squares were chosen randomly as replicates at each sampling point. Soil samples were obtained at three different locations on the surface sampling square after vegetation aboveground surface crusts and plant litter were removed. Three soil samples from the sampling squares were then mixed into a composite soil sample. We did not stratify sampling and only took surface sampling points (0-5 cm) from the Qinghai-Tibet Plateau.

Point 12: Line 162: change B8A to B8

Response 12: Sorry for the mistake. We have revised it.

Point 13: Line 163: this statement needs reference

Response 13: Thanks for your nice suggestions. We have added relevant references

Point 14: Sections 2.5.1. to 2.5.4.: in each of the described machine learning models, please explain datasets you used, or how did you build models with selected covariates, what were the inputs.

Response 14: Thank you for pointing this out. Indeed, we have explained the choice of variables in section 2.2 and section 3.2, and all variables were brought into the four models for selecting the optimal model. It would be repetitive to explain each of the models in sections 2.5.1 to 2.5.4.

Point 15: Line 220: abbrev. for VIP

Response 15: Thank you for pointing this out. We have revised the manuscript completely.

Point 16: Line 224: as mentioned in general comments, did you mean “residual prediction deviation” for RPD, used to find prediction error with variation in the data? Put reference for RPD categorization.

Response 16: The ratio of performance to deviation (RPD) is the ratio between the Standard Deviation (SD) of the reference method against that of the RMSE. We have added a short description of the RPD in 2.5.5 and put a reference for RPD categorization. Please see the revised manuscript with marked version.

Point 17: Line 246: kurtosis was high…did you mean “lower”

Response 17: Sorry for the mistake. We have revised it.

Point 18: Line 252: put reference

Response 18: We have added relevant references

Point 19: Line 257: please explain “strong generalization and consistency”

Response 19: We apologize that the meaning of the description did not show clearly. We wanted to express the consistently high performance of the RF model in the evaluation of the R2, RMSE, and RPD. We should consider that this description should be presented in section 4.1, so we delete the description in section 3.1.

Point 20: Table 4: what independent data (different environmental covariates) did you use for models represented in this table (SOC-250, bands or VIs, or all together)?

Response 20: In selecting the optimal model, all variables were brought into the model. After the RF model was selected as the optimal model, the variables were entered sequentially by VIP sorting from 1 to 15 to find the appropriate mtry and ntree values. The final variables selected for prediction were: SOC_250, B2, B11, SAVI, NDVI, B5, and SATVI.

Point 21: Line 265 and Figure 4, line 343: change to variable importance

Response 21: This has been revised in the manuscript.

Point 22: Line 366 to 368: modify the sentence as “For instance, Gomez, et al. [71] reported that hyperspectral proximal and remote sensing had the potential to predict SOC using visible and near infrared reflectance.”

Response 22: This has been revised in the manuscript.

Point 23: Line 385: put HWSD in reference list

Response 23: Thanks for your nice suggestions. We have added relevant references.

Author Response File: Author Response.pdf

Reviewer 2 Report

This manuscript of Yang et al. predicted the SOC spatial distribution of QTP by using field measurements, Soil Information Grids Products (250 m), and Sentinel-2 images with different machine learning methods. Generally, this study is based on good design and intense data analysis and model simulation, and the results provide a case study of digital soil mapping with Sentinel-2 data over complex landscapes. The structure of the manuscript is well designed and the logic is easy to follow. The discussion is detailed and suggestive. I believe that this research is potentially a good contribution for carbon neutrality in the QTP and a reference for the digital mapping of SOC in complex terrain. However, some minor corrections should be further considered to improve the clarity of the manuscript. Therefore, I suggest a minor revision before accepting for publication in Remote Sensing.

1. References 21 It is recommended to supplement relevant references.

2. Lines 65-67 The sentence needs revise.

3. Lines 117-119 Large regions of alpine steppes and meadows cover the plateau, shrubs are primarily distributed in the southeast, and coniferous and broadleaf forests are primarily distributed in the south and southeast of the QTP. This sentence can be modified as ‘The plateau is covered with large areas of alpine steppes and alpine meadows, and shrubs are mainly distributed in the southeast, whereas coniferous and broadleaf forests occur mainly in the south and southeast of the QTP.’

4. In Figure 3, I find that the SOC units are not consistent with the text. Please keep the unit consistency.

5. Line 261 There seems to be something wrong with the format of this line, it is recommended to revise it. In addition, the R2 value for RF reaches to 0.823, which is much higher than other models, please explain this in detail in the discussion section.

6. Line 289 Pay attention to singular and plural forms in sentences.

7. Line 252 3.1 Model performance. Please add the formula of Kurtosis and CV in the appendix.

8. In this manuscript, the authors compared their predictions with other existing maps. However, at the beginning of 4.3 section, the lack of sentences that continues the first two discussions makes the paragraph lacking connectivity.

9. Line 409-413 This part of the discussion is too brief, and the author's purpose of comparing the existing pictures is not very clear, please improve.

10. Line 423 Please replace Wu’ prediction with Wu’s prediction.

Author Response

Response to Reviewer 2 Comments

Comment: Manuscript titled as “Retrieval and Mapping of Surface Soil Organic Carbon Using Sentinel-2 Spectral Images in the Qinghai-Tibetan Plateau” represents a valuable contribution to investigations of global carbon cycle and soil quality, especially to surface SOC evaluation through modeling and mapping. There is a need to obtain enough accurate SOC information in even less accessible, complex and data-poor regions, through development of efficient methods. In this study, authors aimed to use remote sensing data as well existing soil data and field measurements to map SOC in Qinghai-Tibetan Plateau. They applied different machine learning methods to predict and map SOC in the whole area.

Response: Thank you for your valuable comments. We have revised the manuscript as per your valuable comments.

Point 1: References 21 It is recommended to supplement relevant references

Response 1: Thanks for your nice suggestions. We have added relevant references in line 91. Please see the revised manuscript.

Point 2: Lines 65-67 The sentence needs revise

Response 2: Thank you for your suggestion. We have revised the manuscript in line 66-67.

Point 3: Lines 117-119 Large regions of alpine steppes and meadows cover the plateau, shrubs are primarily distributed in the southeast, and coniferous and broadleaf forests are primarily distributed in the south and southeast of the QTP. This sentence can be modified as ‘The plateau is covered with large areas of alpine steppes and alpine meadows, and shrubs are mainly distributed in the southeast, whereas coniferous and broadleaf forests occur mainly in the south and southeast of the QTP.’

Response 3: Thank you for this very insightful comment. As suggested by the reviewer, we have revised this sentence as ‘The plateau is covered with large areas of alpine steppes and alpine meadows, and shrubs are mainly distributed in the southeast, whereas coniferous and broadleaf forests occur mainly in the south and southeast of the QTP.’

Point 4: In Figure 3, I find that the SOC units are not consistent with the text. Please keep the unit consistency.

Response 4: Thank you for pointing this out. The units are wrong, and we have fixed the error in Figure 3.

Point 5: Line 261 There seems to be something wrong with the format of this line, it is recommended to revise it. In addition, the R2 value for RF reaches to 0.823, which is much higher than other models, please explain this in detail in the discussion section.

Response 5: We have added explanation for this. “However, it should be noted that the R2 between training and validation had subtle differences, reflecting the problem of over-fitting with limited samples when processing a strong model. Nevertheless, RF achieve high performance in predicting SOC content based on performs regression tasks through multiple decision regression trees, this study determined RF as the most optimal to predict SOC content due to its high predictive performance (Table 3) and flexibility to be operated with Sentinel-2 data.” Furthermore, we compared the PLSR, SVM to analyzed the R2 value for RF is higher than other models in section 4.1.

Point 6: Line 289 Pay attention to singular and plural forms in sentences.

Response 6: This has been revised in the manuscript.

Point 7: Line 252 3.1 Model performance. Please add the formula of Kurtosis and CV in the appendix.

Response 7: Thank you for your suggestion. We have added the description in section 2.5.5 as “We took 75% of the 396 surface sampling points as the calibration dataset and 25% as the validation dataset, and preprocessed the calibration and validation dataset separately, including the mean, Kurtosis, CV, etc.”.

Point 8: In this manuscript, the authors compared their predictions with other existing maps. However, at the beginning of 4.3 section, the lack of sentences that continues the first two discussions makes the paragraph lacking connectivity.

Response 8: We have revised the manuscript in beginning of 4.3 section.

Point 9: Line 409-413 This part of the discussion is too brief, and the author's purpose of comparing the existing pictures is not very clear, please improve.

Response 9: Thanks for your great suggestion on improving the accessibility of our manuscript. As suggested by the reviewer, we have improved the description of comparing the existing maps in the discussion section as follows: Compared to existing soil map datasets, our prediction has a consistent texture and shape with the image of Sentinel-2, providing more details than other soil maps, it is significantly more accurate and can well represent SOC variations across the QTP. Thus, it can be inferred that the combination of high-resolution Sentinel-2 images can improve the prediction accuracy in predicting SOC content.

Point 10: Line 423 Please replace Wu’ prediction with Wu’s prediction

Response 10: The Wu’ prediction has been changed into “Wu’s prediction”.

Author Response File: Author Response.pdf

Reviewer 3 Report

I'm reviewing the manuscript "Retrieval and Mapping of Surface Soil Organic Carbon Using Sentinel-2 Spectral Images in the Qinghai-Tibetan Plateau"

the proposed article is well writte and clearly presented.

The used materials are adeguately exposed and the applied procedures clearly presented.

The paper face the developement of a predictive model to map SOC from spatial information (mainly Earth Observation and derived indices). Several machine learning approaches are assessed and compared and the best one is selected.

Within the experimental design a potential flaw concern the variables used as input for machine learning algorithms.

The most relevant variable in prediction model (i.e. SOC_250) comes from Liu et al, 2022. Liu's article explore the link between environment and soil properties and relies on landsat (both 5 and 8) and modis and further climate and topography inputs.

Does the inclusion of Liu's result in your model gets available information from Landsat5, Landsat8, Modis, topography and Sentinel 2? 

SOC pattern predicted by Liu et al. seems to differ from the proposed one; this match with the different model applied.

It would be of interest for the reader to explore the sources of such differences.

The conclusive declaration about Sentinel2 datasource at lines [435-436] seems overstated when considering the variance importance distribution in RF projection values (Fig.4).

I suggest the above mentioned improvement to be applied before pubblication.

Author Response

Reviewer 3-Comments-answer

Comment: I'm reviewing the manuscript "Retrieval and Mapping of Surface Soil Organic Carbon Using Sentinel-2 Spectral Images in the Qinghai-Tibetan Plateau"

the proposed article is well written and clearly presented.

The used materials are adequately exposed and the applied procedures clearly presented.

The paper face the development of a predictive model to map SOC from spatial information (mainly Earth Observation and derived indices). Several machine learning approaches are assessed and compared and the best one is selected.

Within the experimental design a potential flaw concern the variables used as input for machine learning algorithms.

Response: Thank you for your valuable comments. We have revised the manuscript as per your valuable comments.

Point 1: The most relevant variable in prediction model (i.e. SOC_250) comes from Liu et al, 2022. Liu's article explored the link between environment and soil properties and relies on Landsat (both 5 and 8) and Modis and further climate and topography inputs.

SOC pattern predicted by Liu et al. seems to differ from the proposed ones; this match with the different model applied.

It would be of interest for the reader to explore the source of such differences.

Response 1:Thank you for this very insightful comment. As mentioned by the reviewer, taking into account the spatial and temporal resolution of QTP, it is hard to use the climate and topography as variables input in the model. Meanwhile, Liu’s team released high resolution National Soil Information Grids of China at 250 m spatial resolution, which explored the link between environment and soil properties and relies on Landsat (both 5 and 8) and Modis and further climate and topography inputs. Thus, the SOC_250 from Liu’s article was resampled to a raster cell size of 20 m as an input variable combined with our field measured and Sentinel-2 data on the QTP to more accurately predict the distribution of SOC in 2020. Particularly, both the quantile regression forest they used and the RF we used are ensemble tree-based machine learning models that can handle complex nonlinear relationships and multivariate interactions with high predictive power. And we have added an explanation for this in section 4.2.

Point 2: The conclusive declaration at lines [435-436] seems overstated when considering the variance important distribution in RF projection values.

Response 2: We have revised and reorganized sentence in the conclusion. Our predication was highly dependent on the SOC_250 data, thus it has the highest importance. B2, B11, SAVI, and NDVI have high VIP in the 14 variables.

Author Response File: Author Response.pdf

Back to TopTop