Next Article in Journal
Multireaction Modeling of Lead (Pb) and Copper (Cu) Sorption/Desorption Kinetics in Different Soils
Next Article in Special Issue
Mapping Soil Biodiversity in Europe and the Netherlands
Previous Article in Journal
Cultured Microfungal Communities in Biological Soil Crusts and Bare Soils at the Tabernas Desert, Spain
Previous Article in Special Issue
What is the Best Inference Trajectory for Mapping Soil Functions: An Example of Mapping Soil Available Water Capacity over Languedoc Roussillon (France)
 
 
Article
Peer-Review Record

Digital Mapping of Soil Classes Using Ensemble of Models in Isfahan Region, Iran

by Ruhollah Taghizadeh-Mehrjardi 1,2,*, Budiman Minasny 3, Norair Toomanian 4, Mojtaba Zeraatpisheh 5,6, Alireza Amirian-Chakan 7 and John Triantafilis 8
Reviewer 1:
Reviewer 2: Anonymous
Submission received: 3 April 2019 / Revised: 18 May 2019 / Accepted: 24 May 2019 / Published: 28 May 2019
(This article belongs to the Special Issue Digital Soil Mapping of Soil Functions)

Round 1

Reviewer 1 Report

Within “Ensemble model for improved digital mapping of soil classes in Isfahan region, Iran” (Manuscript ID: soilsystems-477561) authors have applied a series of statistical and ensembled methods to predict soil classes. Through this paper we have many modelling possibilities tested within one work, in the same area using the same dataset. So, it allows to compare all of theme in a fair approach. 

 

Specific comments: 

Title could be more appealing.

Abstract is a bit too large. The introductory section could be more concise. The objective could be clearly pointed out. There are also too many results, authors could focus into most important ones.

 

Introduction

Line 83 to 87: This paragraph could bring some information regarding how ‘Ensemble methods’ have improved predictions when compared with “statistical and data mining approaches through DSM”. It would be an opportunity to give readers an idea on what to expect for? How well have ensemble methods being performing? Even if there are no ensemble methods for soil classes, but applications on continuous soil attributes could be used as examples.   

Line 86: Rewrite “…digital soil mapping of soil…” to “…DSM of soil…”

Line 91: “…study were to; ” see the proper use of semicolons throughout the text.

Line 91: The main objectives .... “to describe spatial prediction of USDA soil taxonomic classes”. Is ‘to describe’ the best word for it ?

Line 92: Check the use of colon to introduce a list “… including;”

 

Materials and Methods

Line 110: Soil samples were collected by the roads? Or, by which sampling scheme?

Line 130: Table 1 should show how representatives are Orders, Sub-orders, Great groups and Subgroups (in km2 area, for example).

Line 137: Table 2 should bring sources of all datasets. Or, it could be clearly state (with URL, for example) within the three 2.3 subsections.

Line 137: Table 2 Remote sensing attributes where derived from remote sensing atmospheric corrected data?

Line 251: Use DSM instead of ‘digital soil mapping’

Line 168: To have a clearly text: The subsection ‘2.4. Soil taxonomic class prediction’ could be modified to ‘2.4.1 Traditional models…’  and ‘2.4.2 Ensemble methods…’

Line 267 and 268: “….but not yet applied in DSM study [18].” The reference [18] is from 2015. See if authors will keep that reference?  

 

Results and Discussion

Line 309: There could be a Table for the sensitivity values? This Table would allow to see all the sensitivity values in one unique format. Besides it would be easy to compare different methods and relate theme to the Orders, Sub-orders, Great groups and Subgroups areas (how representatives are they in the region).

Line 381: In: “It can be due to the number of observations and the soil taxonomic level.” For the first time authors have made commends on the number of observations. How about the previously presented results (from 3.1.1 to 3.1.4)? Are they prone to the same issue? That is why Table 1 should have the area of each soil type.  

Line 381: “…and the soil taxonomic level”. Authors could be clearer on that??

Line 389: The subsection “3.1.6. Bayesian Networks” has a mix of Introduction, Materials and Methods and Results. How could it be rearranged?

Line 415 to 416: The lack of a Table for Sensitivity, made it difficult to track and to compare all different results for every soil type throughout the paper.   

Line 420: “However the accuracy increased when ensemble.2 model was utilized” Authors should comment on how much effort was needed to achieve a 0.03 gain in OA when comparing RF and the ensemble.2 model. Do we really need it? What sort of risks were taken to have a 0.03 (from 0.87 to 0.90) gain? (This suggestion is the same for line 455 to 457).  

Line 434: Using “USDA soil taxonomic classes” authors should comment how DSM techniques (and tested models) are able to predict and differentiated (if they are?!) Sub-orders, Great groups and Subgroups despite very complex ideas involved with soil type mapping. For example, for mapping subgroups it can be used “typic subgroup represents the basic or 'typical' concept of the great group to which the described subgroup belongs”. For mapping soil suborders “within an order are differentiated on the basis of soil properties and horizons which depend on soil moisture and temperature.” How do all the tested models grasp those pedological concepts?  

Line 437 to 439: “The results of the present study showed that each specific model was superior in prediction of a specific soil taxonomic class.” Due to? Is there anything to do with Geomorphic surfaces, soil type representativeness, soil type size area, categorical or continues predictors, …?

Line 453: See if “…had more OA...” is ok?

Line 466: “As can be seen in figure 2,...” why Order OA is so high compared to the remaining levels? Why are all three lower levels very closed to each other in the OA graph?

Line 474: Figure 2. Is ensemble.2 model better suited to Sub-orders, Great groups and Subgroups than Order mapping?  

Line 483 to 484: “… Topographic Wetness Index (TWI) confirm the importance of moisture consequences …” How the precipitation of 110mm is distributed in the region? How this information on TWI role is related to ‘severity of soil aridity’ (Line 104) in the region?

Line 488: See “…first nine first variables  …”

Line 492: See the use of (-) in“sub-great-group” in figure 3d.

Line 492: Figure 3 does not show the acronyms meaning. Table 2 does not either.

Line 492: How could every variable be linked to its next position in the next box?

Line 494: Sample numbers Entisols (43) and Aridisols (151). Did they have any role in Entisols and Aridisols quality mapping? Were Aridisols better mapped than Entisols?    

Line 521: Maps don’t need to have all that coordinates!

Line 521: Authors could wok with a zoom in a smaller area, to pinpoint some special features in all these maps.  

Finally: What effect did “…the soil profiles in the Entisols Order were classified into the same Sub-order (Orthents), Great Group (Torriothents) and Subgroup (Typic)” (Line 119-120) have to the predictions? Since, consequently, there was no spatial variation within Entisols position in space?!

 

Conclusions

Line 526: I could not understand the idea in this first conclusion sentence: “A number of individual models, including; ANN, MnLR, SVM, RF, DT , BN, and SMnLR to classify 194 soil profiles into the USDA soil taxonomic classes.”   

Organize the conclusion section in the same order (sequence) which objectives were proposed in the Introduction section.


Author Response

Dear Editors and Reviewers:

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “Ensemble model for improved digital mapping of soil classes in Isfahan region, Iran” (Manuscript ID: soilsystems-477561). The comments were all valuable and very helpful for revising and improving our paper, and they provided important guiding significance for our research. We have studied the comments carefully and have made corrections that we hope meet with approval. We also responded point by point to the reviewers’ comments as listed below. The revised portions are marked in red in the paper.

 

 

1. Title could be more appealing.

The title was changed.

2. Abstract is a bit too large. The introductory section could be more concise. The objective could be clearly pointed out. There are also too many results, authors could focus into most important ones.

Abstract was improved.

3. Line 83 to 87: This paragraph could bring some information regarding how ‘Ensemble methods’ have improved predictions when compared with “statistical and data mining approaches through DSM”. It would be an opportunity to give readers an idea on what to expect for? How well have ensemble methods being performing? Even if there are no ensemble methods for soil classes, but applications on continuous soil attributes could be used as examples.  

The introduction was improved.

4. Line 86: Rewrite “…digital soil mapping of soil…” to “…DSM of soil…”

It was corrected.

5. Line 91: “…study were to; ” see the proper use of semicolons throughout the text.

It was corrected.

6. Line 91: The main objectives .... “to describe spatial prediction of USDA soil taxonomic classes”. Is ‘to describe’ the best word for it ?

It was corrected.

7. Check the use of colon to introduce a list “… including;”

It was corrected.

8. Soil samples were collected by the roads? Or, by which sampling scheme?

Based on stratified random sampling method.

9. Line 130: Table 1 should show how representatives are Orders, Sub-orders, Great groups and Subgroups (in km2 area, for example).

We add supplementary A. The full information about geomorphic surface, area coverage, the number of soil samples in each strata was added.

10. Line 137: Table 2 should bring sources of all datasets. Or, it could be clearly state (with URL, for example) within the three 2.3 subsections.

It was added.

11. Line 137: Table 2 Remote sensing attributes where derived from remote sensing atmospheric corrected data?

It was added.

12. Line 251: Use DSM instead of ‘digital soil mapping’

It was corrected.

13. Line 168: To have a clearly text: The subsection ‘2.4. Soil taxonomic class prediction’ could be modified to ‘2.4.1 Traditional models…’  and ‘2.4.2 Ensemble methods…’

It was corrected.

14. Line 267 and 268: “….but not yet applied in DSM study [18].” The reference [18] is from 2015. See if authors will keep that reference? 

It was corrected.

15. Line 309: There could be a Table for the sensitivity values? This Table would allow to see all the sensitivity values in one unique format. Besides it would be easy to compare different methods and relate theme to the Orders, Sub-orders, Great groups and Subgroups areas (how representatives are they in the region).

It was added for subgroup.

16. Line 381: In: “It can be due to the number of observations and the soil taxonomic level.” For the first time authors have made commends on the number of observations. How about the previously presented results (from 3.1.1 to 3.1.4)? Are they prone to the same issue? That is why Table 1 should have the area of each soil type. 

We decided to remove the relationship of soil sampling density and the accuracy. Because the situation was the same for all models.

17. Line 381: “…and the soil taxonomic level”. Authors could be clearer on that??

It was corrected.

18. Line 389: The subsection “3.1.6. Bayesian Networks” has a mix of Introduction, Materials and Methods and Results. How could it be rearranged?

It was corrected and moved to the method.

19. Line 415 to 416: The lack of a Table for Sensitivity, made it difficult to track and to compare all different results for every soil type throughout the paper.  

It was added for subgroup.

20. Line 420: “However the accuracy increased when ensemble.2 model was utilized” Authors should comment on how much effort was needed to achieve a 0.03 gain in OA when comparing RF and the ensemble.2 model. Do we really need it? What sort of risks were taken to have a 0.03 (from 0.87 to 0.90) gain? (This suggestion is the same for line 455 to 457). 

The reviewer is correct to point out that the improved accuracy of ensemble.2 method is small for prediction of soil order. The overall accuracy is only increased 3% compared with the single best method.

However, under sub-order the OA increased: 35%, greatgroup: 27%, and subgroup 29%.

Similarly for the Kappa index, the accuracy improved: 16%, 60%, 44%, and 43% for order, sub-order, greatgroup and subgroup.

Especially in Sub-order the improvement is substantial. The ensemble method only requires several models be combined together, not much additional work.

21. Line 434: Using “USDA soil taxonomic classes” authors should comment how DSM techniques (and tested models) are able to predict and differentiated (if they are?!) Sub-orders, Great groups and Subgroups despite very complex ideas involved with soil type mapping. For example, for mapping subgroups it can be used “typic subgroup represents the basic or 'typical' concept of the great group to which the described subgroup belongs”. For mapping soil suborders “within an order are differentiated on the basis of soil properties and horizons which depend on soil moisture and temperature.” How do all the tested models grasp those pedological concepts? 

This is due to the nature of the mathematical model, which we cannot explain.

22. Line 437 to 439: “The results of the present study showed that each specific model was superior in prediction of a specific soil taxonomic class.” Due to? Is there anything to do with Geomorphic surfaces, soil type representativeness, soil type size area, categorical or continues predictors, …?

This is due to the nature of the mathematical model, which we cannot explain.

23. Line 453: See if “…had more OA...” is ok?

It was corrected.

24. Line 466: “As can be seen in figure 2,...” why Order OA is so high compared to the remaining levels? Why are all three lower levels very closed to each other in the OA graph?

At the order, level, because the small number of soil classes, the soil classes can be predicted quite well by most models. Thus the accuracy improvement is also small. At the lower levels of taxonomy, more soil classes are to be predicted, and thus the accuracy becomes lower

25. Line 474: Figure 2. Is ensemble.2 model better suited to Sub-orders, Great groups and Subgroups than Order mapping? 

Yes, as explained earlier, as the prediction under Order level is already accurate (OA = 0.87), there is little room for improvement. While at other levels, prediction by a single model is still poor (OA < 0.5) and thus there are rooms for improvement.

26. Line 483 to 484: “… Topographic Wetness Index (TWI) confirm the importance of moisture consequences …” How the precipitation of 110mm is distributed in the region? How this information on TWI role is related to ‘severity of soil aridity’ (Line 104) in the region?

We added some references.

27. Line 488: See “…first nine first variables  …”

It was corrected

28. Line 492: See the use of (-) in“sub-great-group” in figure 3d.

It was corrected

29. Line 492: Figure 3 does not show the acronyms meaning. Table 2 does not either.

It was corrected

30. Line 492: How could every variable be linked to its next position in the next box?

It was corrected

31. Line 494: Sample numbers Entisols (43) and Aridisols (151). Did they have any role in Entisols and Aridisols quality mapping? Were Aridisols better mapped than Entisols?   

We did not consider the number of samples and its effect on the accuracy

32. Line 521: Maps don’t need to have all that coordinates!

It was corrected

33. Line 521: Authors could wok with a zoom in a smaller area, to pinpoint some special features in all these maps. 

It was corrected

34. Finally: What effect did “…the soil profiles in the Entisols Order were classified into the same Sub-order (Orthents), Great Group (Torriothents) and Subgroup (Typic)” (Line 119-120) have to the predictions? Since, consequently, there was no spatial variation within Entisols position in space?!

This is due to the nature of the mathematical model, which we cannot explain.

35. Line 526: I could not understand the idea in this first conclusion sentence: “A number of individual models, including; ANN, MnLR, SVM, RF, DT , BN, and SMnLR to classify 194 soil profiles into the USDA soil taxonomic classes.”  

It was corrected

36. Organize the conclusion section in the same order (sequence) which objectives were proposed in the Introduction section.

It was corrected

 


Reviewer 2 Report

1) As I understand, authors have 194 sampling points, and 44 Geomorphic surfaces codes, approximately 4.4 sample per code. Since I don't see the data, I have a question: how many different soil types are located in each Geomorphic surface? What is performace of naive classifier answering the majoruty soil type for each Geomorphic surface? Do each Geomorphic surface got samples? Some cross table (codes vs soil types) and/or models with the Geomorphic surfaces codes excluded from variables could clearify situation

2) Since most (all?) of the methods used provide predicted classes probability, it's better to use ROC- analisys (AUC) as performace metric since it not depend on the thresold unlike the sensitivity and specificity used 

Author Response

1. As I understand, authors have 194 sampling points, and 44 Geomorphic surfaces codes, approximately 4.4 sample per code. Since I don't see the data, I have a question: how many different soil types are located in each Geomorphic surface? What is performace of naive classifier answering the majoruty soil type for each Geomorphic surface? Do each Geomorphic surface got samples? Some cross table (codes vs soil types) and/or models with the Geomorphic surfaces codes excluded from variables could clearify situation

It was corrected. We add supplementary A. The full information about geomorphic surface, area coverage, the number of soil samples in each strata was added.

 

2. Since most (all?) of the methods used provide predicted classes probability, it's better to use ROC- analisys (AUC) as performace metric since it not depend on the thresold unlike the sensitivity and specificity used

It was added.


Back to TopTop