Next Article in Journal
A Study on Hyperspectral Apple Bruise Area Prediction Based on Spectral Imaging
Next Article in Special Issue
Non-Destructive Appraisal of Macro- and Micronutrients in Persimmon Leaves Using Vis/NIR Hyperspectral Imaging
Previous Article in Journal
The Effect of Reduced and Conventional Tillage Systems on Soil Aggregates and Organic Carbon Parameters of Different Soil Types
Previous Article in Special Issue
On Precision Agriculture: Enhanced Automated Fruit Disease Identification and Classification Using a New Ensemble Classification Method
 
 
Article
Peer-Review Record

Yield Prediction for Winter Wheat with Machine Learning Models Using Sentinel-1, Topography, and Weather Data

Agriculture 2023, 13(4), 813; https://doi.org/10.3390/agriculture13040813
by Oliver Persson Bogdanovski 1,†, Christoffer Svenningsson 1,†, Simon Månsson 2, Andreas Oxenstierna 3 and Alexandros Sopasakis 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Agriculture 2023, 13(4), 813; https://doi.org/10.3390/agriculture13040813
Submission received: 17 January 2023 / Revised: 16 March 2023 / Accepted: 25 March 2023 / Published: 31 March 2023
(This article belongs to the Special Issue Applications of Data Analysis in Agriculture)

Round 1

Reviewer 1 Report

This is a paper on a relevant topic that lies within the scope of Agriculture. The work aims to explore the abilities of two different machine learning algorithms to forecast winter wheat production for fields in southern Sweden. Overall, the paper presentation requires significant improvement. The authors need to address the following issues before it can be accepted.

1. First of all, the title of the paper is not accurate enough and needs to be revised. The name and location of crops for yield simulation are unclear. It is difficult for readers to understand the specific information of the article.

2. The structure of the paper should be modified. For example, Line 326 Section “4.1 feature importance” and Line 391 “6.1 future work”, where is no other following section?

3. In section “1 Introduction” and section 2 “State of the art and challenges”, the author did not focus on the research progress of crop yield simulation models and methods. More recent studies on remote sensing-based crop yield simulation need to review, and ask specific scientific questions based on the state of knowledge. The goal of this study is divergent and needs further focus based on the literature review.

4. The study area is needs to be introduced to the reader separately.

5. In section “3.5 Models”, there are two machine learning algorithms used in this study. We know that each machine learning method has its advantages and disadvantages. Why did you choose these two ML models and not other models?

6. Line 256-270, the author explained how to match the different spatial resolutions of the data, but the temporal resolution of various data is still unclear.

7. Line 294-296 and Figure 10, please explain the height difference between the predicted and measured distribution for 2019, and show the predicted results of other years.

8. Line 328-334, please show the data of feature importance scores.

9. Line 342, please give the specific data and results.

10. Line 349-350, it is recommended to show the results of simpler model (linear and logistic regression) in supplementary files.

11. Figure 1 is not cited in the text.

12. Line 160, (Fig.3)) →(Fig.3)

13. Figure 4, the map should be marked with longitude, latitude and the legend of different color dots. What is mean of the numbers in the right figure represent?

14. Line 193 (see Fig.6)) →(Fig.6)

15. Table 1, the style of table in not suit for the template of journal.

16. It is noted that the manuscript needs careful editing by someone with expertise in technical English editing paying particular attention to sentence structure so that the goals and results of the study are clear to the reader.

Author Response

 

1. First of all, the title of the paper is not accurate enough and needs to be revised. The name and location of crops for yield simulation are unclear. It is difficult for readers to understand the specific information of the article.

Response. 

We have now changed the title to "Yield prediction for winter wheat with machine learning models using Sentinel-1, topography, and weather data". The information about the name and location of crops is now also included in the abstract, in the introduction and more importantly displayed clearly in Figure 3 together with latitude and longitude coordinates. We also now include a new Subsection 3.1 named Study Area which further clarifies these points.

 

2. The structure of the paper should be modified. For example, Line 326 Section “4.1 feature importance” and Line 391 “6.1 future work”, where is no other following section?

Response.

Following the Referee's recommendations these subsection titles and subsection numbers are now removed and the whole paragraph in lines 328-338 has now been rewritten in order to improve the English and better clarify the concept of feature importance for this model. 

 

3. In section “1 Introduction” and section 2 “State of the art and challenges”, the author did not focus on the research progress of crop yield simulation models and methods. More recent studies on remote sensing-based crop yield simulation need to review, and ask specific scientific questions based on the state of knowledge. The goal of this study is divergent and needs further focus based on the literature review.

Response

We understand two different concerns raised here by the Referee and we would therefore like to address each of those separately. 
Specifically, we see one concern being the 
3a) need to review more recent studies of remote sensing-based crop yield simulations 
and another concern is that 
3b) the goal of this study is too divergent and further focus must be exercised based on the literature review.
We address each of those in the manuscript in lines 33-69. We also provide specific replies to each here below:
Response on 3a.
According to the Referee's recommendation we now include new references [4-7]. In fact reference [4] is a recent review article dealing specifically with remote sensing data and their contribution to crop growth models. Furthermore, article [5] deals with the limitations of crop growth models while articles [6, 7] deal mostly with remote sensing capabilities for crop growth models. 
Response on 3b.
We think that the Referee's comments about our focus being too divergent lies partly with the fact that, as shown in the above studies, we do not directly examine critical indicators such as for instance LAI (leaf area index), ET (evapotranspiration) and the fraction of absorbed photosynthetically active radiation (fAPAR). Typically studying and improving the simulations of these quantities is directly linked to more accurate crop yield estimations as shown in many other similar studies of remote sensing contributions toward crop growth models. 
Our effort however was to not directly model these quantities, which has been done before in many studies, but instead examine whether it was possible for the machine learning method to learn the full end-to-end model in order to ascertain yield short of bypassing the crop growth model. This is mainly the reason we did not refer directly to LAI, ET or fARAR but only to the final results which is forecasting the yield. We now include relevant such comments in both the introduction as well as the State of the Art sections of the manuscript and hope these comments make our focus on this study a bit clearer. We thank the Referee for making this observation and welcome any further comments in terms of specific recent developments we can refer to which will further focus our approach in the context of current work in the field.

4. The study area is needs to be introduced to the reader separately.

Response. 

We have now included a small subsection 3.1 which we named "Study Area" in order to include information about the geographical location of the study as well as specific details of the number and size of fields. Specifically, we now include the following text in lines 177-181.
"The area of study was the region of Skåne, southern Sweden (\figref{\ref{fig:method:skane_and_eopatches}}). The number of winter wheat fields from which the ground truth yield data was gathered was 49, 31, 132, and 88 for the years 2017, 2018, 2019, and 2020 respectively. The distribution of field sizes had a mean of 20 Hectares and a median of 14 Hectares." Figure 3 is also included in that subsection.

 


5. In section “3.5 Models”, there are two machine learning algorithms used in this study. We know that each machine learning method has its advantages and disadvantages. Why did you choose these two ML models and not other models?

Response.

The reason for choosing the LGBM and Feed Forward NN (FNN) models is that these are two classic machine learning approaches that, in our opinion, anyone should start from in machine learning studies. In particular, the LGBM model is simple to construct and deploy yet versatile (i.e can become larger or smaller to fit the given data) and powerful enough to describe patterns in data from a large number of different and diverse sources. Then the FNN is as simple a neural network as possible but it is what we call a classic neural network (i.e not a decision tree). In that respect, we wanted to uncover whether the simple FNN architecture was able to describe our data just as well as the decision tree-based model. 
We now include the above information about also in the manuscript at the end of  Section 3.5. Models, in lines 271-273.

 

6. Line 256-270, the author explained how to match the different spatial resolutions of the data, but the temporal resolution of various data is still unclear.

Response:

We have now included the following clarifying text: 
"In general however we only retained a single contribution per week from each of our three input data sources (Sentinel-1, weather or topography). " in the manuscript in lines 166-167 as well as in the caption of Figure 3. More information about how this single contribution was chosen (i.e. it is the first unobstructed satellite contribution during that week) is also included and can be found in lines 161-166. 

 

7. Line 294-296 and Figure 10, please explain the height difference between the predicted and measured distribution for 2019, and show the predicted results of other years.

Response.

A number of new figures have now been added (Figures 10, 11, and 13) which present the data for the years 2017, 2018, and 2020. A similar agreement is shown therefore for all these other years as for the year 2019 (originally presented). Appropriate changes in the text were implemented in lines 326-334.

 

8. Line 328-334, please show the data of feature importance scores.

Response. 

We have now included feature importance scores in a new Appendix D in the manuscript. We also include a short discussion there to indicate which particular features from the data seemed to be more important in predicting yield. We also refer the reader to the GitHub repository for the full results. 


9. Line 342, please give the specific data and results.

Response.

We now include the related image for the FNN model on the right-hand side of the current Figure 9 of the manuscript which also shows the decrease as the size of the dataset increases.

 

10. Line 349-350, it is recommended to show the results of simpler model (linear and logistic regression) in supplementary files.

Response.

We have now added Appendix C with results from linear regression simulations.

 

11. Figure 1 is not cited in the text.

Response. 

Figure 1 is now referenced as suggested.

 

12. Line 160, (Fig.3)) →(Fig.3)

Response. 

It is now corrected as suggested.

 

13. Figure 4, the map should be marked with longitude, latitude and the legend of different color dots. What is mean of the numbers in the right figure represent?

Response:

Longitude and latitude have been added to the map and all data points now have the same color, as the color did not contain any useful information in the context of the figure. The right figure has been simplified by removing all the numbers in the image which did have any significance with respect to what the figure was trying to tell. The figure text was also revised and now it should be clear what the figure is trying to say.


14. Line 193 (see Fig.6)) →(Fig.6)

Response.

It is now corrected as suggested.

15. Table 1, the style of table in not suit for the template of journal.

Reponse:

Minor adjustments to the table according to the MDPI layout style guide were added

16. It is noted that the manuscript needs careful editing by someone with expertise in technical English editing paying particular attention to sentence structure so that the goals and results of the study are clear to the reader.

Response.

We have now used a native English speaker who has re-edited the article.

 

Reviewer 2 Report

Dear authors,

Your paper shows that radar data can potentially be used to estimate crop yield. The results look quite convincing. 

However, several issues need to be addressed before it can be accepted for publication:

1) How did you split the training and testing data? Is it possible that you used data points from the same crop field for training and testing? If so, you should at least mention that this may entail problems with spatial autocorrelation. Figure 10 provides some indications of how many data points you used for testing. But it might be better to mention the size of the test data sets.

2) The term "harvest" is not commonly used in agronomy. It might be better to replace it with "yield".

3) You are providing some brief explanations on what can be measured with radar (L82). This is too brief for most agronomists. Can you please expand on these explanations?

4) What is the size of the crop fields you used? I am assuming that radar data may run into problems when applied to small crop fields due to noise?

 

Author Response

1) How did you split the training and testing data? Is it possible that you used data points from the same crop field for training and testing? If so, you should at least mention that this may entail problems with spatial autocorrelation. Figure 10 provides some indications of how many data points you used for testing. But it might be better to mention the size of the test data sets.

Response. 

Indeed this is a very important point raised by the Referee which is now addressed in more detail in the manuscript in Section 3.7. The following text is found there now in lines 263-268: 
"Out dataset is created from a total of 49, 31, 132, and 88 fields for years 2017, 2018, 2019, and 2020 respectively (Fig. 7). Some such fields were located next to each other while some other fields were far apart as can also be seen in Figure 4. We trained a different algorithm for each data year. For each such year therefore we randomly split the corresponding data into training and testing sets at a ratio of 80:20. "
Similarly, we now include in lines 200-203the following clarifying text to address any concerns about possibly using data points from the same fields:
"Data leakage occurs when information from one data point overlaps another data point. By choosing grids that are 50 meters apart and centers with 25 meters distance, as can be seen in the middle part of Fig. 6 we guarantee that there will be no overlap or duplicated information in our data.".
As the Referee points out it is indeed possible however that we may have used (non-overlapping) data from the same crop field. This is a classic issue in several scientific disciplines where input data are inherently near each other. In that case, as the Referee suggests, spatial autocorrelation issues may indeed exist due simply to proximity in the data. We discuss such issues in lines 394-404 related to such proximity correlation effects in the Discussion section and also included an extra reference [38] addressing such autocorrelation phenomena. 

 

2) The term "harvest" is not commonly used in agronomy. It might be better to replace it with "yield".

Response.

Thank you, it is now replaced in the full document.

 

3) You are providing some brief explanations on what can be measured with radar (L 82). This is too brief for most agronomists. Can you please expand on these explanations?

Response.

Indeed we agree with this Referee's comment. We believe that no one knows yet the true physiological background of this backscatter and in fact that further study needs to be performed on the ground to understand what it is that we are measuring. However, based on current information, we have now expanded that paragraph with further details in lines 89-98 as well as in lines 125-135. What we added in these lines is the following text:
"For example, in the case of sugar beet, radar data can be used 89
to monitor the emergence and closure of the canopy, as well as estimate the crop height and biomass throughout the growing season. Similarly, for potato crops, radar data can be used 91
to monitor canopy closure, as well as detect changes in biomass and moisture content. For maize and wheat, radar data can be used to monitor crop growth and development, as well 
as estimate biomass and yield. In the case of English ryegrass, radar data can be used to estimate grass height and biomass. Specifically winter wheat in the Netherlands [14] could closely match all phenological stages: tillering, stem elongation, booting, heading, flowering, fruit development, ripening, and harvesting, to Sentinel-1 VV and VH backscatter data."
and then later on when we discuss more Sentinel-1 capabilities we added in lines 125-135:
"Sentinel-1 is a radar imaging satellite that can provide information about the Earth’s surface regardless of weather conditions or daylight. This makes it an ideal tool for monitoring agricultural crops over time. The radar signal emitted by Sentinel-1 interacts with the crops and provides information about their physical characteristics, such as height and structure. This information can be used to monitor the growth and development of crops throughout the season, allowing farmers and agronomists to make informed decisions about irrigation, fertilization, and pest management. With Sentinel-1 data, it is possible to measure the backscatter, or the amount of radar energy reflected back to the satellite from the Earth’s surface. This backscatter signal can be used to derive several biophysical parameters related to vegetation, such as vegetation water content, biomass, and vegetation height."


4) What is the size of the crop fields you used? I am assuming that radar data may run into problems when applied to small crop fields due to noise?

Response: 

The distribution of field sizes had a mean of 20 Hectares and a median of 14 Hectares. We do not believe this would be an issue since the sentinel-1 data we used has a resolution of 11x11m² and our resampled harvest grid dataset had a resolution of 50x50m². Meaning that we accumulate several sentinel-1 pixels for every grid data point in the harvest data set regardless of the field total size. The relevant text is now added clarifying this information about field sizes in lines 216-217 in Section 3.4.

Round 2

Reviewer 1 Report

The authors have made sufficient modifications according to the comments, and I suggest that this paper be accepted without further modification.

Back to TopTop