Next Article in Journal
Dual-Mode Control Scheme to Improve Light Load Efficiency for Dual Active Bridge DC-DC Converters Using Single-Phase-Shift Control
Next Article in Special Issue
Detrital Mica Composition Quantitatively Indicates the Sediment Provenance along the Subei Coast to the Yangtze Estuary
Previous Article in Journal
Clinical Characterization of Inpatients with Acute Conjunctivitis: A Retrospective Analysis by Natural Language Processing and Machine Learning
Previous Article in Special Issue
Geochemistry, Geochronology, and Prospecting Potential of the Dahongliutan Pluton, Western Kunlun Orogen
 
 
Article
Peer-Review Record

Evaluation of Soil Nutrient Status Based on LightGBM Model: An Example of Tobacco Planting Soil in Debao County, Guangxi

Appl. Sci. 2022, 12(23), 12354; https://doi.org/10.3390/app122312354
by Zhipeng Liang 1, Tianxiang Zou 1, Jialin Gong 1, Meng Zhou 1, Wenjie Shen 1,2,3,*, Jietang Zhang 4, Dongsheng Fan 5 and Yanhui Lu 5
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2022, 12(23), 12354; https://doi.org/10.3390/app122312354
Submission received: 21 September 2022 / Revised: 21 November 2022 / Accepted: 23 November 2022 / Published: 2 December 2022
(This article belongs to the Special Issue New Advances and Illustrations in Applied Geochemistry)

Round 1

Reviewer 1 Report

The authors used a LightGBM model to explore soil nutrient status, which can preferably provide reference for the development of modern agriculture. The article as a whole is well written, but can be improved after minor adjustments. After these corrections, the manuscript will acquire completeness to be published in the prestigious Applied Sciences journal.

#1 Abstract:  Every abstract should only contain relevant information that summarizes the work; terms like "in a word"... are useless. Also adds more quantitative information if relevant. I recommend authors use the following reference to adjust their abstract (https://doi.org/10.1016/j.carbon.2007.07.009).

#2 There are several typographical and grammatical mistakes which should be corrected, e.g., in ‘…Du’an town, Dongling town and Jingde town” a comma is missing. Please check all mistakes.

#3 The novelty of the work is not expressed explicate. In what aspect is this work original and better than others? Please try to be clearer about the novelty of the work.

#4 The last paragraph of the introduction must be dedicated to explain the main goal of the work as well as it was reached.

#5. In Tab. 3, the mean values are represented without the standard error. Was more than one measurement taken? Was there statistical analysis? I am convinced that the average values must be compared to substantiate a real result.

#6 Although the conclusion can be listed by more than 1 item, it must be written in a single running paragraph. Please fix it.

Author Response

Dear Reviewer:

    Thank you for your leter and for the reviewers’ comments concerning our manuscript entitled “Evaluation of Soil Nutrient Status Based on LightGBM Model: An Example of Tobacco Planting Soil in Debao County, Guangxi” (ID: applsci-1955296).

    Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responds to the reviewer’ s comments are as flowing:

Responds to the reviewer’s comments:

  1. Response to comment: #1 Abstract: Every abstract should only contain relevant information that summarizes the work; terms like "in a word"... are useless.

      Response: It is really true as Reviewer suggested that there are many more parts in the summary. We have made correction according to the Reviewer ’s comments.

  1. Response to comment: #2 There are several typographical and grammatical mistakes which should be corrected, e.g., in ‘…Du’an town, Dongling town and Jingde town” a comma is missing. Please check all mistakes.

      Response: The comma in the town is to distinguish Du'an Town from Duan Town. We are very sorry for our negligence of this.

  1. Response to comment: #3 The novelty of the work is not expressed explicate. In what aspect is this work original and better than others? Please try to be clearer about the novelty of the work.

      Response: The novelty of the work is mainly to use machine learning methods to solve agricultural problems, which is more innovative than traditional research methods. It is mainly mentioned in the introduction.

  1. Response to comment: #4 The last paragraph of the introduction must be dedicated to explain the main goal of the work as well as it was reached.

     Response: We have carefully revised the end of the introduction. Mainly in the part of the uploaded file marked red.

  1. Response to comment: #5. In Tab. 3, the mean values are represented without the standard error. Was more than one measurement taken? Was there statistical analysis? I am convinced that the average values must be compared to substantiate a real result.

     Response: For the calculation of the average value of nutrient factors, we are all the results obtained through multiple calculations and analysis.

  1. Response to comment: #6 Although the conclusion can be listed by more than 1 item, it must be written in a single running paragraph. Please fix it.

     Response: The paragraphs of the article have been modified. Mainly in the part of the uploaded file marked red.

Special thanks to you for your good comments

 

Yours sincerely,

Zhipeng Liang

Corresponding author:

Wenjie Shen

                                                                    

Author Response File: Author Response.docx

Reviewer 2 Report

Review Comments for Applied Sciences manuscript applsci-1955296, entitled "Evaluation of Soil Nutrient Status Based on LightGBM Model: An Example of Tobacco Planting Soil in Debao County, Guangxi", submitted by Liang et al. (2022).

General Comments

The authors of this manuscript dealt with the topic regarding the evaluation of soil nutrient status in seven tobacco fields based on LightGBM Model. For me, this manuscript is easy to follow and generally the quality of the manuscript is good. In this case, I recommend the article for publication in the journal Applied Sciences after addressing the following issues.

Specific Comments

-The Introduction Section, Paragraph 3: As present, you gave two many examples of the applications of other algorithms. I suggested that you may delete one or two citations and add brief descriptions of the principles and usage Scenarios of the model that you used in this study.

-Line 88 Please add the objectives of this study and the clear hypotheses.

-Line 91 Please add the longitude and latitude of this research area.

-Line 95 To my knowledge, the light, medium and heavy loam do not be included in the classification of soil texture of FAO. Please make it clear.

-In the whole text, please correct the units of soil nutrients from the “g/kg or mg/kg or cmol/kg” to the “g kg–1 or mg kg–1 or coml kg–1”.

-I noticed that the correlation analysis was used in this manuscript. However, the statistical analyses section is missing, please add this section and all relevant descriptions of the statistical methods used in the current manuscript.

-The Discussion Section needs to be further improved. At present, you simply described the results but did not conduct an in-depth analysis of what causes such findings and point out the potential significance of the results obtained. Meanwhile, there was lacking analysis of the uncertainties and limitations of the models you used in this study.

-Moreover, I only have two suggestions in the Discussion Section. One is that the linkage to the proposed hypotheses in the Discussion must be strongly improved. Another is that the authors had better to add a statement of implications or ramifications of results to end the Discussion section. What do the results mean in a larger context?

-Finally, this manuscript contains a large number of grammatical errors and poor wording that decrease from its overall quality. Please check the manuscript and refine the language carefully. I would recommend publication after further editing to improve its readability and grammatical correctness.

Author Response

Dear Reviewer:

 Thank you for your leter and your comments concerning our manuscript entitled “Evaluation of Soil Nutrient Status Based on LightGBM Model: An Example of Tobacco Planting Soil in Debao County, Guangxi” (ID: applsci-1955296).

  Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responds to the reviewer’ s comments are as flowing:

Responds to the reviewer’s comments:

    1.-The Introduction Section, Paragraph 3: As present, you gave two many examples of the applications of other algorithms. I suggested that you may delete one or two citations and add brief descriptions of the principles and usage Scenarios of the model that you used in this study

Response: This suggestion is valuable and helpful for improving our paper. We have been modified. Thank you for your advice

  1. -Line 88 Please add the objectives of this study and the clear hypotheses.

Response: Thank you for your advice and we have revised the article in the last paragraph of the preface.

  1. -Line 91 Please add the longitude and latitude of this research area.

Response: We are very sorry for our incorrect writing and we have carefully revised this.

  1. -Line 95 To my knowledge, the light, medium and heavy loam do not be included in the classification of soil texture of FAO. Please make it clear.

Response: We have revised it :The soil texture types are mainly clay and loam in the study.

  1. -In the whole text, please correct the units of soil nutrients from the “g/kg or mg/kg or cmol/kg” to the “g kg–1or mg kg–1or coml kg–1”.

Response: Thank you for your advice. Because each nutrient factor has different test values, if it is expressed in a unified unit, some values in the table will be larger or smaller. After literature reading, multiple units coexist in this way. Moreover, such a layout does not affect the establishment of later models.

  1. -I noticed that the correlation analysis was used in this manuscript. However, the statistical analyses section is missing, please add this section and all relevant descriptions of the statistical methods used in the current manuscript.

Response: Because this article mainly conducts research through the establishment of the model, we do not have much space to describe the correlation analysis here. Of course, we also have some modifications. Thank you very much for your comments

  1. -The Discussion Section needs to be further improved. At present, you simply described the results but did not conduct an in-depth analysis of what causes such findings and point out the potential significance of the results obtained. Meanwhile, there was lacking analysis of the uncertainties and limitations of the models you used in this study.

       -Moreover, I only have two suggestions in the Discussion Section. One is that the linkage to the proposed hypotheses in the Discussion must be strongly improved. Another is that the authors had better to add a statement of implications or ramifications of results to end the Discussion section. What do the results mean in a larger context?

Response: We appreciate your comments and have made some changes. Different from previous research articles related to machine learning, our article pays more attention to application. Therefore, when combining models with agricultural research, the discussion of some knowledge related to nutrient factors more serves the model.

  1. -Finally, this manuscript contains a large number of grammatical errors and poor wording that decrease from its overall quality. Please check the manuscript and refine the language carefully. I would recommend publication after further editing to improve its readability and grammatical correctness.

Response: We are very sorry for our incorrect writing and we have carefully revised this and reviewed the whole article many times.

Special thanks to you for your good comments.

 

Yours sincerely,

Zhipeng Liang

Corresponding author:

Wenjie Shen

                                                                   

Author Response File: Author Response.docx

Reviewer 3 Report

Line 61 – Font error in word “central “.

Line 62 – Font error in word based .

Line 72 – 8167 hm2, or km2?

Line 92 – hydrothermal or Hydrological?

Line 95 - The soil texture types are mainly clay, light, medium and heavy loam – this is not correct.

Line 103 - Technical specification of balanced  soil fertilization by soil testing and Soil Testing of the Ministry of Agriculture – citation please!

Line 126  - Evaluation criteria of abundance and deficiency for nutrients in tobacco planting soil. – by who? Citation please.

Line 162 -  value of about 6.9, which is generally weak alkaline – This is not true !! Under 7.00 pH is acidic!

Line  305 - The contents of soil organic matter, available K, total N, available P, exchangeable Ca and exchangeable Mg, and other nutrient factors are high???  According to Table 2   Evaluation criteria of abundance and deficiency for nutrients in tobacco planting soil – the contents  are moderate!

 

 

Author Response

Dear Reviewer:

    Thank you for your leter and for the reviewers’ comments concerning our manuscript entitled “Evaluation of Soil Nutrient Status Based on LightGBM Model: An Example of Tobacco Planting Soil in Debao County, Guangxi” (ID: applsci-1955296).

    Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responds to the reviewer’ s comments are as flowing:

Responds to the reviewer’s comments:

  1. Line 61 – Font error in word “central “.

     Response: We are very sorry for our incorrect writing and we have carefully revised this and reviewed the whole article many times. According to the comments of other reviewers, the part of this word has been modified and deleted.

  1. Line 62 – Font error in word based.

     Response: We are very sorry for our incorrect writing and we have carefully revised this and reviewed the whole article many times. According to the comments of other reviewers, the part of this word has been modified and deleted.

  1. Line 72 – 8167, or km2?

     Response: We changed hm2 into ha.

  1. Line 92 – hydrothermal or Hydrological?

     Response: hydrothermal

  1. Line 95 - The soil texture types are mainly clay, light, medium and heavy loam – this is not correct.

     Response: This part has been modified. The modified part:The soil texture types are mainly clay and loam in the study

  1. Line 103 - Technical specification of balanced soil fertilization by soil testing and Soil Testing of the Ministry of Agriculture – citation please!

     Response: Modified

  1. Line 126 - Evaluation criteria of abundance and deficiency for nutrients in tobacco planting soil. – by who? Citation please.

     Response: The evaluation criteria for nutrient abundance and deficiency of tobacco planting soil are from Integrated Management of Tobacco Planting Soil and Tobacco Nutrients in China

  1. Line 162 - value of about 6.9, which is generally weak alkaline – This is not true! Under 7.00 pH is acidic.

     Response: We are very sorry for our incorrect writing and we have carefully revised this.

  1. Line 305 - The contents of soil organic matter, available K, total N, available P, exchangeable Ca and exchangeable Mg, and other nutrient factors are high??? According to Table 2 Evaluation criteria of abundance and deficiency for nutrients in tobacco planting soil – the contents are moderate!

     Response: We are very sorry for our incorrect writing and we have carefully revised this and reviewed the whole article many times.

Special thanks to you for your good comments.

Yours sincerely,

Zhipeng Liang

Corresponding author:

Wenjie Shen

                                                                   

Author Response File: Author Response.docx

Reviewer 4 Report

General comments

The title of the paper starts with ‘Evaluation of Soil Nutrient Status. The objective is ‘to analyze the main nutrient features of tobacco planting… and provides scientific reference for rational layout of tobacco planting. At the same time, the LightGBM model also provides new and innovative methods for the research of other problems in agricultural field.’

First, authors should explain why their approach is scientific while others did not appear to be appropriate (l. 46-52). ‘To analyze features’ is not an objective but a means to meet the objective. ‘Tobacco planting’ is too general. Do you mean ‘tobacco fertilization’? As stated, the ‘objective’ appears to be a modelling exercise rather than a useful model.

Soil fertility classification is established using crop response to fertilizers in relation to potential yield as target variable and site-specific features. These key variables and the have not been documented in the data set to support tobacco farmers’ decisions in relation with soil fertility classification. Soil fertility models are based on soil fertility trials. There are no such trials on tobacco (alkaline soils) to support soil fertility classification for tea (acid soils!). Even then, the LightGBM model is asserted in the first place to be the best model without any comparison with other models. If LightGBM is a means to prove something, then authors should formulate hypotheses relevant to the problem under study. Hypotheses are accepted or rejected, and this is the basis to draw conclusions explicitly.

Now, what are the authors trying to prove? Table 2 presents five soil fertility classes for tea using a paper not readily accessible or readable by most readers. They have soil test values in hand for tobacco, although soil tests for Cu, Mn and Fe are missing (any scientific reason for this?). Is there a challenge to transfer information on soil tests from tea to tobacco for such different soils and regions? If soil test results for tobacco have been already categorized by the authors according to the tea classification, then why check if the classification is correct (this would be rather trivial, isn’t it)? What is the meaning of Figure 4 given 9 features? What is the accuracy of the training model compared to its application to the testing model? Is there overfitting of the training model? What is the purpose of the cross-validation model? This paper brings little to an objective of sustainable tobacco planting.

Minor comments

l. 41 : the biosphere

l. 40-45: too general introduction if the intended readership is agricultural science.

l. 73: better to use more familiar ha (hectares) than square hectometers

l. 82-87: this a strange objective. Soil nutrients are classified qualitatively for soil test interpretation to help farmers understand the numbers provided by soil testing laboratories. The purpose is to propose the nutrient dosage most appropriate to reach potential crop yield.

l. 87-88: this is out of context and should be moved to the Discussion section if relevant.

l. 92: 1000 m (spacing)

l. 93: ‘good’ for what?

l. 93: 1462.5 mm (spacing)

l. 95: 1325 h (spacing)

l. 95: what do you mean by ‘heavy loam’? In general, we use ‘heavy’ for ‘heavy clay’, not for ‘heavy loam’.

l. 96-97: move part of this sentence to l. 93 to define ‘good’

Table 1: I can trace all methods but ‘available K and P’. How were K and P extracted? I understand that P is quantified by colorimetry (the blue or the yellow method?) but K is generally quantified by emission spectroscopy. Can you clarify?

Table 1: support the NY/T and LY/T methods by references easily accessible by most readers in English.

l. 126: provide a reference for this classification. Ref 28 (l. 220) is a guide for tea, not tobacco. This looks like a short hop of confounding apples and oranges.

Tables 1-2: why didn’t you measure Cu, Mn and Fe?

Table 2: rich’ cannot qualify a soil. Use ‘high’.

Table 2: why measurement unit is mg/kg for K and cmol/kg for Ca and Mg. If you adopt cmol/kg as unit, specific whether cmol refers to a single charge (cmolc).

l. 143: the sentence seems to finish after ‘growing area’ and another sentence to start with ‘The data’

l. 141-145: In absence of experimental data where field trials relate dosage to soil test to establish soil fertility groups as in Table 2, observational data may provide the right ‘recipe’ to reach high-quality crop yields. Why didn’t you document the fertilizer doses?

l. 153-154: could you explain the purpose of pre-processing the data by PCA and cluster analysis?

l. 155: indicate the proportion the data set that has been split into training and testing data sets? Was there overfitting of the training data on the testing data? In l. 217-218, it is indicated 70:30. If I multiply 1038 total observations by 0.3 to set apart the testing data set, I obtain 311 observations for testing, not 290. If I divide 290 by 0.3, I obtain 967 total observations. Here are the 71 missing observations gone? Were they rejected as outliers?

Figure 2: where is cross-validation used to build the model after checking for possible training/testing overfitting?

l. 166,170, 177, 195 and elsewhere: rich’ cannot qualify a soil (social sciences vs. physical sciences). ‘Use ‘high’.

 l. 163-176: spacing preceding the unit

Figure 3: show nutrients9Ca, Mg, B, Zn) instead of letters (a-d) on each histogram.

l. 194: ‘Where’ instead of ‘When’

l. 197-199: this explanation is far from being convincing because you confound total N, that is mainlay made of organic N, with inorganic N. Indeed, soil organic matter contains organic C and N often in close proportions (C/N ratio).

Section 3.2: what is the purpose of correlation analysis? Recognizing close correlations among features, then what should the next step?

l. 214: classification ability?

l. 218-220: move to the M&M section

l. 219: why did you make five soil fertility categories while you have numbers that provide more accurate features?

l. 222-227: define explicitly what you mean by ‘True’? There are 9 features. If target variable is category, features have already classified by category, so what?  

l. 225: I counted 324 samples in total, of which 294 are ‘true’. Why didn’t you use the whole data set in cross-validation? Is there overfitting?

Author Response

Dear Reviewer:

   Thank you for your leter and your comments concerning our manuscript entitled “Evaluation of Soil Nutrient Status Based on LightGBM Model: An Example of Tobacco Planting Soil in Debao County, Guangxi” (ID: applsci-1955296).

Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responds to your comments are as flowing:

Responds to the reviewer’s comments:

1.General comments

Point 1: First, authors should explain why their approach is scientific while others did not appear to be appropriate (l. 46-52). ‘To analyze features’ is not an objective but a means to meet the objective. ‘Tobacco planting’ is too general. Do you mean ‘tobacco fertilization’? As stated, the ‘objective’ appears to be a modelling exercise rather than a useful model.

Response 1: Thank you very much for your comments. In this study, the LightGBM model used to study tobacco planting is innovative compared with traditional methods.

Point 2: Soil fertility classification is established using crop response to fertilizers in relation to potential yield as target variable and site-specific features. These key variables and the have not been documented in the data set to support tobacco farmers’ decisions in relation with soil fertility classification. Soil fertility models are based on soil fertility trials. There are no such trials on tobacco (alkaline soils) to support soil fertility classification for tea (acid soils!). Even then, the LightGBM model is asserted in the first place to be the best model without any comparison with other models. If LightGBM is a means to prove something, then authors should formulate hypotheses relevant to the problem under study. Hypotheses are accepted or rejected, and this is the basis to draw conclusions explicitly.

Response 2: In this paper, soil fertility model is based on soil fertility test. What we consider more is the application of LightGBM in tobacco planting soil. As a method, the method can be applied to different fields in view of some literature. For example, we can learn from each other by studying the correlation between the main nutrient factors of tobacco planting soil and rice soil fertility. Of course, as you said, there is no such tobacco (alkaline soil) test to support the soil fertility classification of tea (acid soil).

Secondly, in this study, we focus on agricultural application, and want to express the accuracy of the model through confusion matrix. Because the space is not enough, we did not compare other LightGBMs with other models. In the future research work, we will certainly strengthen the research in this area. Thank you for your valuable comments.

Point 3: Now, what are the authors trying to prove? Table 2 presents five soil fertility classes for tea using a paper not readily accessible or readable by most readers. They have soil test values in hand for tobacco, although soil tests for Cu, Mn and Fe are missing (any scientific reason for this?). Is there a challenge to transfer information on soil tests from tea to tobacco for such different soils and regions? If soil test results for tobacco have been already categorized by the authors according to the tea classification, then why check if the classification is correct (this would be rather trivial, isn’t it)? What is the meaning of Figure 4 given 9 features? What is the accuracy of the training model compared to its application to the testing model? Is there overfitting of the training model? What is the purpose of the cross-validation model? This paper brings little to an objective of sustainable tobacco planting.

Response 3: First of all, in this study, we mainly selected nine nutrient factors, including pH value, organic matter, total N, available K, available P, exchangeable Mg, exchangeable Ca, available B, and available Zn, as the test set to establish the model, without taking Cu, Mn, and Fe into account. Because there are many nutrient factors that affect soil fertility, we mainly read the articles, and then retain these nutrient factors that are important for tobacco growth. This is our research direction is to evaluate the nutrient status of tobacco planting soil rather than the fertility of tobacco planting soil. In future research, we will strengthen the research on Cu, Mn, and Fe.

Secondly, if we only use nutrient factors as the test set to build the model, we think that this model can be used in tea research. Of course, this model is also our first attempt to apply to the agricultural field, and the specific application also needs to be verified by data. The verification of soil classification is mainly used to test the reliability of the test model.

Finally, compared with its application in the test model, the training model has high accuracy, and the training model has not been over fitted. The purpose of cross validation model is to verify the accuracy and reliability of the model. The goal of sustainable tobacco cultivation has been mentioned in our discussions and conclusions, and we have also revised it. Finally, thank you again for your comments and suggestions on our article.

 

2.Minor comments

Point 1: l. 41 : the biosphere

Response1: We are very sorry for our incorrect writing and we have carefully revised this.

Point 2: l. 40-45: too general introduction if the intended readership is agricultural science.

Response2: Thank you for your suggestion and we have made changes in the preface.

Point 3:l. 73: better to use more familiar ha (hectares) than square hectometers

Response3: We have modified it to ha.

Point 4:l. 82-87: this a strange objective. Soil nutrients are classified qualitatively for soil test interpretation to help farmers understand the numbers provided by soil testing laboratories. The purpose is to propose the nutrient dosage most appropriate to reach potential crop yield.

Response4: We have carefully revised it.

Point 5: l. 87-88: this is out of context and should be moved to the Discussion section if relevant.

Response5: We have moved it to the discussion section.

Point 6: l. 92: 1000 m (spacing)

Response6: We are very sorry for our incorrect writing and we have carefully revised it.

Point 7: l. 93: ‘good’ for what?

Response7: ‘good’ represents that the local climate is good and suitable for planting tobacco.

Point 8: l. 93: 1462.5 mm (spacing)

Response8: We are very sorry for our incorrect writing and we have carefully revised it.

Point 9: l. 95: 1325 h (spacing)

Response9: We are very sorry for our incorrect writing and we have carefully revised it.

Point 10: l. 95: what do you mean by ‘heavy loam’? In general, we use ‘heavy’ for ‘heavy clay’, not for ‘heavy loam’.

Response10: We have modified this. The modified part: The soil texture types are mainly clay and loam in the study.

Point 11: l. 96-97: move part of this sentence to l. 93 to define ‘good’

Response11: We have modified this.

Point 12: Table 1: I can trace all methods but ‘available K and P’. How were K and P extracted? I understand that P is quantified by colorimetry (the blue or the yellow method?) but K is generally quantified by emission spectroscopy. Can you clarify?

Response12: P is quantified by colorimetry (the blue method). This method refers toNY/T 1848-2010

Na+ in the combined extractant can exchange with NH+ and K+ on the surface of soil colloid, and enter the solution together with water-soluble ions. The potassium ion in the leaching solution reacts with sodium tetraphenylborate to generate stable potassium tetraphenylborate precipitation, which makes the solution turbid. Within a certain concentration range, the turbidity is proportional to the potassium content in the solution, and is measured at the wavelength of 685nm.

Point 13: Table 1: support the NY/T and LY/T methods by references easily accessible by most readers in English.

Response13: Because there are many agricultural standards for reference, the source of the method has been simplified in the article. The layout is not good when all of them are written in the references. However, if it is necessary to change it, it can also be added in the form of an appendix.

Point 14: l. 126: provide a reference for this classification. Ref 28 (l. 220) is a guide for tea, not tobacco. This looks like a short hop of confounding apples and oranges.

Response14: According to the Integrated Management of Tobacco Planting Soil and Tobacco Nutri-ents in China , the grading standard of abundance and deficiency of different nutri-ent factors is shown in Table 2.

Point 15: Tables 1-2: why didn’t you measure Cu, Mn and Fe?

Response15: In this study, we mainly selected nine nutrient factors, including pH value, organic matter, total N, available K, available P, exchangeable Mg, exchangeable Ca, available B, and available Zn, as the test set to establish the model, without taking Cu, Mn, and Fe into account. Because there are many nutrient factors that affect soil fertility, we mainly read the articles, and then retain these nutrient factors that are important for tobacco growth. This is our research direction is to evaluate the nutrient status of tobacco planting soil rather than the fertility of tobacco planting soil. In future research, we will strengthen the research on Cu, Mn, and Fe.

Point 16: Table 2: rich’ cannot qualify a soil. Use ‘high’.

Response16: We are very sorry for our incorrect writing and we have carefully revised it.

Point 17: Table 2: why measurement unit is mg/kg for K and cmol/kg for Ca and Mg. If you adopt cmol/kg as unit, specific whether cmol refers to a single charge (cmolc).

Response17: Thank you for your advice. Because each nutrient factor has different test values, if it is expressed in a unified unit, some values in the table will be larger or smaller. After literature reading, multiple units coexist in this way. Moreover, such a layout does not affect the establishment of later models.

Point 18: l. 143: the sentence seems to finish after ‘growing area’ and another sentence to start with ‘The data’

Response18: We have modified this.

Point 19: l. 141-145: In absence of experimental data where field trials relate dosage to soil test to establish soil fertility groups as in Table 2, observational data may provide the right ‘recipe’ to reach high-quality crop yields. Why didn’t you document the fertilizer doses?

Response19: In this study, we are more interested in studying the current soil fertility nutrient status than in influencing the soil fertility trend through manual intervention. So we did not record the amount of fertilizer applied.

Point 20: l. 153-154: could you explain the purpose of pre-processing the data by PCA and cluster analysis?

Response20: PCA and cluster analysis are used for data preprocessing, because the later model needs classification,

Point 21: l. 155: indicate the proportion the data set that has been split into training and testing data sets? Was there overfitting of the training data on the testing data? In l. 217-218, it is indicated 70:30. If I multiply 1038 total observations by 0.3 to set apart the testing data set, I obtain 311 observations for testing, not 290. If I divide 290 by 0.3, I obtain 967 total observations. Here are the 71 missing observations gone? Were they rejected as outliers?

Response21: 1038 total observations are training set, and290 test samples from the study area. There are not missing observations gone.

Point 22: Figure 2: where is cross-validation used to build the model after checking for possible training/testing overfitting?

Response22: Thank you very much for your comments. During the establishment of our model, there was no over fitting of the model. In cross validation, 70% of the training data are used for validation. When the validation rate is greater than 90%, the model can be established.

Point 23: l. 166,170, 177, 195 and elsewhere: rich’ cannot qualify a soil (social sciences vs. physical sciences). ‘Use ‘high’.

Response23: We are very sorry for our incorrect writing and we have carefully revised it.

Point 24: l. 163-176: spacing preceding the unit

Response24: We are very sorry for our incorrect writing and we have carefully revised it.

Point 25: Figure 3: show nutrients9Ca, Mg, B, Zn) instead of letters (a-d) on each histogram.

Response25: We are very sorry for our incorrect writing and we have carefully revised it.

Point 26: l. 194: ‘Where’ instead of ‘When’

Response26: We are very sorry for our incorrect writing and we have carefully revised it.

Point 27: l. 197-199: this explanation is far from being convincing because you confound total N, that is mainlay made of organic N, with inorganic N. Indeed, soil organic matter contains organic C and N often in close proportions (C/N ratio).

Response27: Thank you very much for your comments. In the future research, we will strengthen the research in this area.

Point 28: Section 3.2: what is the purpose of correlation analysis? Recognizing close correlations among features, then what should the next step?

Response28: Correlation analysis is mainly to prove that there is an interaction relationship between soil nutrient factors, and it also indicates that the assessment of soil nutrient status is a comprehensive assessment.

Point 29: l. 214: classification ability?

Response29: Yes.

Point 30: l. 218-220: move to the M&M section

Response30: We have modified this.

Point 31: l. 219: why did you make five soil fertility categories while you have numbers that provide more accurate features?

Response31: It can be seen from articles. Compared with the numerical value, the classification of soil fertility is more intuitive, more applicable, and has better practical significance.

Point 32: l. 222-227: define explicitly what you mean by ‘True’? There are 9 features. If target variable is category, features have already classified by category, so what?  

Response32: True is the accurate value of the prediction. After classification, the next step will be performed according to the category.

Point 33: l. 225: I counted 324 samples in total, of which 294 are ‘true’. Why didn’t you use the whole data set in cross-validation? Is there overfitting?

Response33: There is not overfitting.

Special thanks to you for your good comments

 

Yours sincerely,

Zhipeng Liang

Corresponding author:

Wenjie Shen

                                                                   

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Dear Editors and Authors:

Many thinks to authors for the improved version. But some further improvements are necessary - see below. 

-please add the clear hypotheses regarding your study.

-please strongly improve the lineage to the proposed hypotheses.

-this manuscript still contains a large number of grammatical errors and poor wording that decrease from its overall quality. Please check the manuscript and refine the language carefully. I would recommend publication after further editing to improve its readability and grammatical correctness.

Author Response

Dear Reviewer:

 Thank you for your leter and your comments again concerning our manuscript entitled “Evaluation of Soil Nutrient Status Based on LightGBM Model: An Example of Tobacco Planting Soil in Debao County, Guangxi” (ID: applsci-1955296).

We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responds to your comments are as flowing:

Responds to the reviewer’s comments:

1.General comments

Point 1and 2: please add the clear hypotheses regarding your study.

Response 1 and 2: Thank you very much for your comments. I added this part at the end of the introduction. The added part:

In this study, the data of 9 nutrient factors are preprocessed through principal component analysis in tobacco planting soil of 7 towns in Debao County, including Yandong town, Longguang town, Najia town, Zurong town, Du'an town, Dongling town and Jingde town, and then used as test set to build the LightGBM model. The feasibility of LightGBM model is proved by confusion matrix, and the important difference between diverse soil nutrient factors are obtained by eigenvalue analysis. Through the classification and prediction of the LightGBM model, the nutrient status of tobacco planting soil is evaluated automatically in the study area. Therefore, using LightGBM model to study the nutrient status of tobacco planting soil can provide some scientific reference for the improvement of soil fertility in the local tobacco industry and the rational layout of tobacco planting.

 

Point 3: this manuscript still contains a large number of grammatical errors and poor wording that decrease from its overall quality. Please check the manuscript and refine the language carefully. I would recommend publication after further editing to improve its readability and grammatical correctness.

Response 3: Deer Reviewer, based on your suggestions on this aspect of my article, I have carefully reviewed it from the beginning and made corresponding modifications. Due to my English level, I may not be able to see something. I sincerely ask you to help me point it out. I will make serious efforts to revise it again. Thank you again for your reply to my article.

Author Response File: Author Response.docx

Reviewer 4 Report

Most points are still unanswered :

General comments: 1-2-3

Minor comments: 2-4-12-13-14-15-17-21-22-27-31-33

 

Author Response

Dear Reviewer:

    Thank you for your leter and your comments again concerning our manuscript entitled “Evaluation of Soil Nutrient Status Based on LightGBM Model: An Example of Tobacco Planting Soil in Debao County, Guangxi” (ID: applsci-1955296).

    We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in red in the paper. The main corrections in the paper and the responds to your comments are as flowing:

Responds to the reviewer’s comments:

1.General comments

Point 1: First, authors should explain why their approach is scientific while others did not appear to be appropriate (l. 46-52). ‘To analyze features’ is not an objective but a means to meet the objective. ‘Tobacco planting’ is too general. Do you mean ‘tobacco fertilization’? As stated, the ‘objective’ appears to be a modelling exercise rather than a useful model.

Response 1:

Firstly, in this study, it is not said that it is inappropriate to emphasize others' research methods, but by contrast, it is innovative to use machine learning methods to study agricultural related issues, which is more in line with the development of the times and technological innovation.

Secondly, the analysis of the characteristics of soil nutrient factors is to better correspond to the assessment of nutrient status of the following model, making the article more convincing.

This model is not a single study of the nutrient status of tobacco planting soil in Debao County. Instead, 1038 sample points of Baise City, Hechi City and Hezhou City were used to train and verify the model, and the nutrient status of tobacco planting soil in Debao County was tested after a good model was obtained. Therefore, it is not a training model, but an effective model that can be used.

 

Point 2: Soil fertility classification is established using crop response to fertilizers in relation to potential yield as target variable and site-specific features. These key variables and the have not been documented in the data set to support tobacco farmers’ decisions in relation with soil fertility classification. Soil fertility models are based on soil fertility trials. There are no such trials on tobacco (alkaline soils) to support soil fertility classification for tea (acid soils!). Even then, the LightGBM model is asserted in the first place to be the best model without any comparison with other models. If LightGBM is a means to prove something, then authors should formulate hypotheses relevant to the problem under study. Hypotheses are accepted or rejected, and this is the basis to draw conclusions explicitly.

Response 2:

Firstly, soil fertility is the result of comprehensive influence of many influencing factors. In addition to a variety of common nutrient factors, it also includes climate factors, terrain factors, etc. Therefore, in this study, it is mainly to study the nutrient status of tobacco planting soil rather than its soil fertility. It is not rigorous to say that soil fertility can be studied by using 9 soil nutrient factors.

Secondly, the main reference model in this study is the innovative application of methods. Because the model LightGBM has the ability to classify itself, it can be applied to another field. For example, we can learn from each other by studying the correlation between the main nutrient factors of tobacco planting soil and rice soil fertility.

Furthermore, LightGBM is also compared with other models in the process of model establishment. Because the article focuses more on agricultural application, the accuracy of the model is represented by confusion matrix. In addition, the assumptions related to the study have been revised after the introduction.

 

Point 3: Now, what are the authors trying to prove? Table 2 presents five soil fertility classes for tea using a paper not readily accessible or readable by most readers. They have soil test values in hand for tobacco, although soil tests for Cu, Mn and Fe are missing (any scientific reason for this?). Is there a challenge to transfer information on soil tests from tea to tobacco for such different soils and regions? If soil test results for tobacco have been already categorized by the authors according to the tea classification, then why check if the classification is correct (this would be rather trivial, isn’t it)? What is the meaning of Figure 4 given 9 features? What is the accuracy of the training model compared to its application to the testing model? Is there overfitting of the training model? What is the purpose of the cross-validation model? This paper brings little to an objective of sustainable tobacco planting.

Response 3:

Firstly, Table 2 shows the evaluation criteria for nutrient abundance and deficiency of flue-cured tobacco planting soil rather than tea [1]. The content of each nutrient factor is studied and analyzed according to this standard.

Secondly, in this study, we did not consider copper, manganese and iron. Because there are many nutrient factors affecting soil fertility, we mainly read the article [2-4], and then retain these important and common nutrient factors for tobacco growth. Our research direction is to evaluate the nutrient status of tobacco planting soil, rather than the fertility of tobacco planting soil. Therefore, we will choose appropriate nutrient factors.

Thirdly, the whole article did not apply the classification of tea to the study of tobacco planting soil. In the article on tea, we mainly used the method of principal component analysis for reference, and only used the method of principal component classification for reference. The overall modeling is based on 1038 sample points of Baise City, Hechi City and Hezhou City. In general, the research on tobacco planting soil is applied to the field of tobacco planting soil. In order to make the model more feasible, it is verified.

Fourthly, the 9 characteristics mainly correspond to the 9 soil nutrient factors. Figure 4 shows the confusion matrix of the model, which is mainly used to check the accuracy of the model.

Fifthly, compared with its application in the test model, the training model has high accuracy, and the training model has not been over fitted. The purpose of cross validation model is to verify the accuracy and reliability of the model.

Sixthly, we have mentioned the goal of sustainable tobacco cultivation in our discussions and conclusions, and we have also revised it.

Finally, thank you again for your comments and suggestions on our article.

Reference

   [1] Chen J.H.; Liu J.L.; Li Z.H. Soil and nutrient status of tobacco growing in China. Integrated Management of Tobacco Planting Soil and Tobacco Nutrients in China. Science Press: Beijing, 2008:39-55. (In Chinese)

[2] Li Z.L.; Lu Y.C., Zhao L.F.; Fan D.S.; Wei Z.; Zhou W.L.; Huang L.G.; Huang Y., Huang J.P.; Gu X.Q.; Nian F.Z. Evaluation on Tobacco-planting Soil Fertility in Longlin County of Guangxi. Chinese Journal of Soil Science. 2020,51(05):1042-1048. (In Chinese)

[3] Gao H.J.; Wei Z.; Luo G.; Lin B.S. Status and Conservation and Remediation Technology of Tobacco-growing Soils in Baise City.   Crop Research. 2016,30(06):736-740. (In Chinese)

[4] Hu H.Z.; Wang H.J.; Liu B.F., Yao Z.D.; Liu X.M.; Jiang C.G.; Niu Z.X.; Qu H.F.; Fang T. Comprehensive Evaluation of Soil Fertility in Panxian Tobacco-Growing Areas of Guizhou Province. Chinese Agricultural Science Bulletin. 2012,28(19):109-116. (In Chinese)

 

2.Minor comments

Point 2: l. 40-45: too general introduction if the intended readership is agricultural science.

Response2:

We have made changes in the preface. The added part:

Soil, as basic environment for crops growth and important means of agricultural production, is the primary guarantee for the sustainable development of the biosphere [1,2]. The abundance or shortage of soil nutrients greatly affects the quality of crops, which is one of the important factors for the development of planting agriculture [3]. Influenced by landform, climate, altitude and so on, soil nutrients are diverse in differ-ent regions [4]. The level of soil nutrients is not only affected by the independent role of nutrient factors, but also depends on the comprehensive coordination of various nu-trient factors [5]. Therefore, exploring the comprehensive evaluation of soil nutrient status can deeply understand the current nutrient feature of soil, which has important guiding significance for farming and fertilization in agricultural area.

 

Point 4:l. 82-87: this a strange objective. Soil nutrients are classified qualitatively for soil test interpretation to help farmers understand the numbers provided by soil testing laboratories. The purpose is to propose the nutrient dosage most appropriate to reach potential crop yield.

Response4:

According to your suggestion, we have carefully revised this. On this basis, the process and assumptions of model application are added. The added part is after the introduction.

 

Point 12: Table 1: I can trace all methods but ‘available K and P’. How were K and P extracted? I understand that P is quantified by colorimetry (the blue or the yellow method?) but K is generally quantified by emission spectroscopy. Can you clarify?

Response12:

P is quantified by colorimetry (the blue method).

Emission spectroscopy is mainly used to measure the available potassium content of corn, rice and other crops, and is rarely used to test the available potassium in soil.

All explanations refer to NY/T 1848-2010.

 

Point 13: Table 1: support the NY/T and LY/T methods by references easily accessible by most readers in English.

Response13:

According to your suggestion, we have carefully revised this. The red mark of the article shows the modified part.

 

Point 14: l. 126: provide a reference for this classification. Ref 28 (l. 220) is a guide for tea, not tobacco. This looks like a short hop of confounding apples and oranges.

Response14:

Here, we mainly use this classification method for reference, not to confuse tea and tobacco. Because on the whole, we build the model through a large amount of existing data, just learn from their research methods, and then apply them to the classification of tobacco nutrient status.

 

Point 15: Tables 1-2: why didn’t you measure Cu, Mn and Fe?

Response15:

In this study, we mainly selected nine nutrient factors, including pH value, organic matter, total N, available K, available P, exchangeable Mg, exchangeable Ca, available B, and available Zn, as the test set to establish the model, without taking Cu, Mn, and Fe into account.   

Because there are many nutrient factors that affect soil fertility, we mainly read the articles [1-3], and then retain these nutrient factors that are important for tobacco growth. This is our research direction is to evaluate the nutrient status of tobacco planting soil rather than the fertility of tobacco planting soil. In future research, we will strengthen the research on Cu, Mn, and Fe.

Reference

[1] Li Z.L.; Lu Y.C., Zhao L.F.; Fan D.S.; Wei Z.; Zhou W.L.; Huang L.G.; Huang Y., Huang J.P.; Gu X.Q.; Nian F.Z. Evaluation on Tobacco-planting Soil Fertility in Longlin County of Guangxi. Chinese Journal of Soil Science. 2020,51(05):1042-1048. (In Chinese)

[2] Gao H.J.; Wei Z.; Luo G.; Lin B.S. Status and Conservation and Remediation Technology of Tobacco-growing Soils in Baise City.   Crop Research. 2016,30(06):736-740. (In Chinese)

[3] Hu H.Z.; Wang H.J.; Liu B.F., Yao Z.D.; Liu X.M.; Jiang C.G.; Niu Z.X.; Qu H.F.; Fang T. Comprehensive Evaluation of Soil Fertility in Panxian Tobacco-Growing Areas of Guizhou Province. Chinese Agricultural Science Bulletin. 2012,28(19):109-116. (In Chinese)

Point 17: Table 2: why measurement unit is mg/kg for K and cmol/kg for Ca and Mg. If you adopt cmol/kg as unit, specific whether cmol refers to a single charge (cmolc).

Response17:

Thank you for your advice. Because each nutrient factor has different test values, if it is expressed in a unified unit, some values in the table will be larger or smaller.

Cmol refers to a single charge, because the exchange capacity of cations can represent soil fertility. To some extent, the greater the cation exchange capacity, the better the soil fertility.

 

Point 21: l. 155: indicate the proportion the data set that has been split into training and testing data sets? Was there overfitting of the training data on the testing data? In l. 217-218, it is indicated 70:30. If I multiply 1038 total observations by 0.3 to set apart the testing data set, I obtain 311 observations for testing, not 290. If I divide 290 by 0.3, I obtain 967 total observations. Here are the 71 missing observations gone? Were they rejected as outliers?

Response21:

Firstly, 1038 total observations are training set, and 290 test samples from the study area. They are independent. There is no over fitting or abnormal values in the model to remove them.

 

Point 22: Figure 2: where is cross-validation used to build the model after checking for possible training/testing overfitting?

Response22:

During the establishment of our model, there was no over fitting of the model. In cross validation, it is mainly used for cross validation of models after model training.

 

Point 27: l. 197-199: this explanation is far from being convincing because you confound total N, that is main lay made of organic N, with inorganic N. Indeed, soil organic matter contains organic C and N often in close proportions (C/N ratio).

Response27:

In the future research, we will strengthen the research in this area.

The added part:

The main reason is that the N content in the soil mainly exists in the form of organic N, and the organic N mainly comes from the inorganic degradation of organic matter.

 

Point 31: l. 219: why did you make five soil fertility categories while you have numbers that provide more accurate features?

Response31:

Compared with the numerical value, the classification of soil fertility is more intuitive, more applicable, and has better practical significance. Secondly, grading the soil nutrient status in the study area can also better show it in the form of pictures, which is more vivid.

 

Point 33: l. 225: I counted 324 samples in total, of which 294 are ‘true’. Why didn’t you use the whole data set in cross-validation? Is there overfitting?

Response33:

In cross validation, due to the large amount of data, it is not possible to validate all the data. In the process of calculation, there is no over fitting, and the result is good.

     Special thanks to you for your good comments

 

Yours sincerely,

Zhipeng Liang

Corresponding author:

Wenjie Shen

                                                                   

Author Response File: Author Response.docx

Back to TopTop