Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Automated Agave Detection and Counting Using a Convolutional Neural Network and Unmanned Aerial Systems

Drones 2021, 5(1), 4; https://doi.org/10.3390/drones5010004

by Donovan Flores^1,*

, Iván González-Hernández¹

, Rogelio Lozano^1,2

, Jesus Manuel Vazquez-Nicolas¹

and Jorge Luis Hernandez Toral¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Drones 2021, 5(1), 4; https://doi.org/10.3390/drones5010004

Submission received: 16 October 2020 / Revised: 16 November 2020 / Accepted: 17 November 2020 / Published: 1 January 2021

Round 1

Reviewer 1 Report

This paper aims to supervise an agave plantation by autonomously counting the number of agave plants in the field using image processing techniques. Since this paper introduces a method for the agave monitoring, I recommend you compare with deep learning techniques. You compared your method with Harr-like classifier, but you should compare with a method like YOLO and SSD which is one of the object detection methods using deep learning.

Point1

As I mentioned previously, you should compare with a deep learning method. Even if you would not like to do so, you should at least cite some papers that utilized them and how about mentioning that in conclusion part for your future work?

For example, I found some papers, please check it out.

Itakura, K. and Hosoi, F., 2020. Automatic Tree Detection from Three-Dimensional Images Reconstructed from 360° Spherical Camera Using YOLO v2. Remote Sensing, 12(6), p.988.

Zhong, Y., Gao, J., Lei, Q. and Zhou, Y., 2018. A vision-based counting and recognition system for flying insects in intelligent agriculture. Sensors, 18(5), p.1489.

Point 2

2.3.3: how about adding equation number?

Point 3

As shown in Fig. 8, this method consists of 3 stages. What are the merits of the multi-stage method compared to end-to-end method like YOLO and SSD?

Point 4

What do you expect happens when you apply this classifier to other agave plants field? The agave plants can be detected accurately, too? Or are there something you have to do more in your opinion?

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

The manuscript “Agave detection and counting based on a convolutional neural network using unmanned aerial vehicle technology” presents an interesting case study demonstrating the use of open access computer vision approaches for automating the counting of agave plants. This work has the potential to support agricultural applications of these approaches to mitigate problems with existing systems, and therefore could be a valuable addition to the knowledge base.

This work appears technically sound and to have been carefully conducted; however, in order to ensure that this work is of the most value to the potential readership, I recommend substantial revisions to the structure and presentation of this report. Extensive editing is required to improve the clarity of your report. I make some suggestions for achieving this improvement below, but please note that these suggestions are not exhaustive and I encourage you to reflect critically on what you wish to achieve with this report.

Technical suggestions

Consider changing your title to “Automated agave detection and counting using convolutional neural network and unmanned aerial systems”

I recommend you add ORCID for each co-author. https://orcid.org/

L1: Change to “We present an automatic…”

L2 Change to “…aerial data…” (not views).

L2: Change to “Our objective was to autonomously count the number of agave plants in an area to aid management of…”

L4: Remove ‘real’, this is implicit.

L9: GPS (actually GNSS) and IMU are standard parts of these systems and should not be specified here.

L13: Finish abstract with one sentence stating the significance of your study for others.

L15L Introduction: These short paragraphs are very fragmented and should be re-structured to aid the flow of this narrative. Consider using paragraphs of ca. 6-10 sentences each.

L22-31: References needed to corroborate these points.

L33: report the larger numbers from the most recent year (2017?) rather than 2016.

L35: replace “…by immature …” with “…with immature…”

L38-42: References needed to integrate this further

Figure 1 – If there are good estimates of the theft of agave, can you use these to add a 2^nd panel to Figure 1 illustrating the rise in theft and financial implications? This will help to demonstrate why your work is needed.

L44: This should be refined for clarity.

L50: references needed.

L55-65: I don’t understand the importance of this information, please revise and/or shorten it. Consider points made in https://zslpublications.onlinelibrary.wiley.com/doi/full/10.1002/rse2.58

L69: phase (not phases)

L80: Why not clouds? (Explain!).

L83-110: combine and shorten this material in a single paragraph.

L131: add scientific genus for the ‘palm oil tree’. See also this work on drone ID of palm oil https://www.tandfonline.com/doi/full/10.1080/01431161.2019.1591651

L140-144: This description of the report structure could usefully be removed for clarity (shorter is better).

L147 – I would expect to see a 2.1 site description here, reporting some basic characteristics of your study site (location, climate, soils, etc.). Then followed by a 2.2 section on data collection (reporting drone properties, flight parameters, overlap and exposure settings, time and date of flights, weather conditions, etc.). Don’t duplicate the drone platform name. State that this was a multirotor. Can you add camera specifications (field of view and MP). Why seek to reduce shadows? (I know why, but you should explain in your report). Also, there are still shadows even at solar noon. Note that solar noon is not exactly the same time as noon!

L161: why did the light intensity vary? How much? This detail matters.

L162: include a section of the data processing – how were the orthomosaics generated? (Software, processing parameters, etc.). Then afterwards include a paragraph describing your manual segmentation of training data etc.

Figure 2 – swap the order of testing and validation.

L190: ‘Quality’ is too vague here, exactly what changes?

L201: 45 degrees isn’t a small pitch angle.

L201: I do not understand why this part of the study was needed? Why can’t the identified plants be geolocated from the orthomosaics to identify the exact position of every plant? Why the need to detect the furrows? You need to make large changes to improve the rational for justifying this part of your work.

If you keep them, Combine figures 5 & 6.

L273-274: Explain more about this parameter tuning? (remember that you can add supplementary materials with extra detail and figures on this aspect of your work).

L433-434: This point is important – can you describe further the implications of these ‘operational conditions’ (Discussed further in Duffy et al., 2017 – linked above)

L392: at this ‘stage’ would be better than ‘moment’.

Figure 11 – “Example of an original…”?

L393-394: Can you combine figures 15, 16 and 17 into one (tall?) figure for a single page (shorter is better) and add annotations of where this crossing is shown?

L399: what is the loss? Present these numbers so that the readers can assess for themselves whether the loss is significant.

L438: I don’t quite understand where this 98% comes from? It appears different from the overall accuracy value I would expect based on my reading of the results as currently written? Please review this carefully.

L446: This appears incomplete. Please refer to the CREDIT for a full list of roles and update this accordingly.

L448: You need to revise this conflicts of interest statement to confirm which statement is applicable for your case.

L456: Most of these references appear to be missing their DOI, please add this information.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Line 12: better performance. It should be mentioned more quantitatively using your result.

Line 45: We have witnessed in recent years a growing interest in applying UAV to improve production in agriculture. We have witnessed in recent years a growing interest in applying UAVs to improve production in agriculture as will be detailed below.

Please correct.

Line 154: Nowadays, many people use phantom 4 not phantom 3. The difference of the drone affects the result?

214: figure => Figure?

224: 5 => Figure 5?

314: in the next table 2.5.1

Do you mean Table 2? Your paper includes so much typo and something like that. It is essential to search errors before the publication.

Line 314: You performed color thresholding. What should we do when the light condition changes and the color information is different from your experiment?

Line 289: pres => precision?

Line 359: Fig => Figure

407: While you mentioned using “%”, the value in Table 1 is 0.98. It should be consistent.

Also, I think a “space” is needed between the number and %, i.e.,”98 %” not 98%.

410: Table caption should be above the table, isn’t it?

Line 409: The precision value should be 0.90 instead of 0.9? Please confirm.

Line 427: figure =>Figure?

Line 487: “IGARSS 2018-2018”. It it correct? “IGARSS 2018”, you mean?

Figure or Fig? it should be consistent throughout your manuscript.

Line

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Thank you for revising your manuscript to address the feedback provided; I find the revised report much improved. I have three final suggestions you should consider to refine your report further, but I am happy to recommend it for publication.

Abstract - I recommend changing the final sentence to read “Our results suggest that the proposed algorithm is robust and has considerable potential to help farmers manage agave agroecosystems.” You did not demonstrate an impact on farmers, only the potential for impact.

Figure 1 - Do you have the opportunity to add information on economic loss for additional years to this figure? (This would help to put your point into context, as a trend fitted to two points is not very robust).

Figure 5 – I think the photo and schematic are duplicitous – I suggest just retaining the schematic and the final photo in your final report for clarity.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

The manuscript has several problems. Even after reading the entire text, it was not clear to me what the real purpose of the work was; would it be testing a computer vision approach to identify Agave or developing a UAV that would recognize plants in real-time?

There are essential flaws in all the manuscript items. Thus, it is clear the authors are not familiar with scientific writing. At the beginning of the reading, I started marking specific issues within the text, but I soon realized the study does not have a sufficient standard for publication. Thus, I will not report here these minor issues that I highlighted, since the text would need complete restructuring to be considered for publication in any good scientific journal.

In the introduction of the manuscript, it is not possible to understand its main objective. The motivation is clear, which is to monitor plant thefts. However, it is not possible to understand the research proposal. A mistake that illustrates the lack of writing scientific rigor is between lines 56 to 128, where authors presented a meaningless literature review. Such review did not justify the work and did not illustrate the challenges that would be faced in the research; there is even methodology being presented in the introduction section.

Reading the methodology, I was unable to understand what the authors wanted to do. They started by talking about mosaics of images collected with a Phanton; then, they talk about another drone with an automatic flying system. Did the authors want to test a computer vision approach or develop a UAV to detect Agavea with embedded processing? If it was the first option, adjustments are demanded to remove this part of flight automation, which would not be relevant for the aim of the study; if the second option is the real one, corrections need to be made from the title of the work onwards, as the difference would not be the recognition algorithm, but the flight system and on-board analysis.

Besides the accuracy value of the identification algorithm shown in both methodology and discussion sections, the article does not present any results. What the authors present as a result is, in fact, material and methods content (Lines 317 to 366). There is no discussion of the results at all.

Thus, I suggest the rejection of the manuscript.

Reviewer 2 Report

Intro section is not very cohesive. The literature summary is thorough but would benefit from focusing more specifically on the how the methods from the review apply to the problem at hand in this study. Making this section more concise and focused on the research problem would improve the paper.

line 34: grammar

line 57: error - fixed wing UAVs usually have lower payload carrying capacity vs multi-rotors

lines 109-112: this section on within-plant fruit detection does not seem relevant to the study

lines 130-133: unless specifically requested by this journal, this overview of the manuscript layout is not needed, leave out

Methods

Section 2.1 is called 'Plots analysis' but does not describe any plots.

Correct 'ortomosaic' as 'orthomosaic' (repeated many times)

line 144: use GSD for orthomosaic resolution, not DPI. should be a distance metric (cm)

lines 152-156: orthomosaics are created to remove perspective and preserve distance, not to make a big picture

section 2.1.2: authors need to define what a furrow is (could be added to the intro section)

It is not clear how the laser range finder is being used for navigation, please clarify this.

CNNs - why is the ortho imagery being pre-processed using classical image processing methods prior to use in a CNN. Deep learning should be capable of detecting objects without the need for image transformations and initial image classification.

Results: Instead of using many separate figures for the stepwise image processing workflow, a flowchart with the processing steps could be more useful.

Conclusions: Authors should place an emphasis on the relevance of real-time detection and counting to communicate why the image analysis could not be done after the image collection. Edge-computing is a common practise for UAV mapping applications. Are there benefits to this on-board processing approach?

How is this approach more useful than standard change detection using spectral vegetation indices which are low computational power and well-understood?

How do the authors envision this being deployed for practical use - can a user story be described for the use of this technology?

General:
This manuscript requires some reformatting structural editing to better bring the study together in a more cohesive manner for the readers. Many small 1-2 sentence paragraphs are hard to follow and make this manuscript read as a collection of ideas rather than a story related to one specific challenge or objective

Reviewer 3 Report

This work on "Agave detection based on CNN…" seems to have a good impact and significance in their location. However, the issues, hypotheses, objectives, and the novel approach developed should be written clearly. This helps the reader directly understand the research questions that are dealt with in the study and get straightforward solutions rather than assumptions. When I read this manuscript, I still have these questions (i) whether the whole UAV platform was built for this application? If yes, what is the need to develop new one rather than off-the-shelf equipment; (ii) whether the image analysis is happening in real-time, or performed in the personal computer? If these two questions were understood clearly, I would have been satisfied. Several writing issues must be revised, and the structure must be revamped. However, I believe the authors will take the comments constructively and improve this paper. I wish to see this paper to be published in the near future. Therefore, I have given several comments below. I hope the authors find this useful and help them revise the manuscript.

Comments:

Title – Since the paper deals with agave counting, it is better to use the keyword 'counting' in the title instead of 'agronomical applications…'. I think you will get better hits if you have a 'stand count' term.
Line 1 – automatic detection of what? Please mention the use case here and how it was achieved. E.g., Automatic detection of agave from ….. (UAV) images using a developed convolutional neural network.
Line 1 – Please change aerials to aerial
The abstract should be in the past tense. Please change 'is' to 'was.'
The citations were not generated in the text. I think the authors have used the LaTeX template to typeset the manuscript. Please check that when you compile and submit to a journal.
Introduction – Please include some citations from the statements you make in paragraphs 1 and 2.
Line 22 – The 'amount of harvest' is termed as 'yield'. Use that terminology.
Also, I am not getting this link here – "the amount of harvest will strongly depend on the security of the plantation". In agriculture, the yield is mainly influenced by nutrients, water, management practices, etc. "Security of the plantation" is directing the reader differently. If the authors are confident with this claim, please cite a case study, which shows some x% was lost in XYZ location because of security issues.
Line 23 – Plant theft is interesting, but I feel it is inappropriate here. Because the paper is not focusing on a security or surveillance system to avoid theft, instead, mention more about the plantation counting point of view.
Line 34 – "The Agave plant is a species which native from Mexico" – Please rephrase and check grammar.
Line 35 – Is the plant left in the field for 8 to 15 years? Or the maturation is done after harvest?
Figure 1 – Is this photo taken by the author? If not, please use a citation.
Line 41 – It should be TRC, not CRT.
Line 42 – It will be better to indicate in USD as well in the parenthesis.
Line 43 – "grew more than triple" is not sounding well and grammatically correct. It can be just replaced with "tripled."
Line 51 – Please use lowercase R for robotics.
Line 52 – This sentence talks about UAV, and line 51 talks about robotics. How are these two linked? If you replace robotics with UAV in line 51, that solves the confusion.
Line 76 – Please replace 'crossing' with 'overlapping.'
Line 91 – Confusing – Both neural networks for identifying large trees?
Line 93 – What is //?
Lines 78-128: You have shown a good amount of literature review. But it will be helpful to mention their accuracies in the detection and discuss more on the technique.
Generally, in any scientific paper, the introduction ends with a few hypotheses and a set of objectives. Please include that.
English language – The paper should undergo a complete English revision to make it more scientific.
Some of the sentences are just said without any support or logic. For eg., How taking images in May avoid shadows? What implication it has?
The area of the field has to be mentioned in Section 2.1.
Since the crop grows for ~8 years, if you use an image from one year for training, it would have undergone several changes if you apply it for the next year. How do you tackle this?
Line 160 – Is 'cut' an image analysis term? It should be 'cropped.'
Line 161 – 'put in a single picture' – The image analysis term for this is 'montage.' Please use such terminologies since you are working on the image analysis area.
Line 164 – photographies?
Line 165 – 'On the other hand' is usually used to contradict or compare another item. Not appropriate here.
Line 170 – pictures, images, photographs are used randomly. Please be consistent.
The paragraphs are very short – hardly consisting of 3 sentences. Please join these paragraphs. It looks odd to read.
Figure 4 is a schematic representation, or is this an output from your algorithm?
Section 2.1.2. Are the furrows sensed in real-time? However, I see there is no image analysis used in this section. Am I getting this right?
I am not able to understand the methods. At one place, the authors are saying DJI Phantom 3, and in Sec. 2.2, there is a separate aerial platform that shows the configuration. Is it the same as the DJI platform? Further explanation of the aerial platform sounds like the authors have developed an entire platform for sensing. Please emphasize on things better so that the readers will be able to grasp the information quickly. What is the need for a new platform? Please mention more on that, so that the readers will know the demand.
The color segmentation is a standard method in image processing. It does not require this general explanation. Instead, mention about what unique things you have covered.
Section 2.4.2 – This is an interesting approach, but I am curious to know if this was computationally intensive. Why don't you simply use the centroid of the segmented geometry? That is simple and straightforward.
Line 300 - What is a heat map here? Do you mean the distance map that is evaluated from binary images? Because as you explain that the whitest point corresponds to the center area. 'Whitest' is not a word; it should be 'brightest' white. Also, if you use -est to any term, then it should be singular.
Line 2.4.3 – CNN detection? What does that mean?
Figure 7 – I am not getting what do you mean by 'No agave'? Does that indicate soil background? Or will there be any other plant species available in the field?
Line 320 – "The plants, which are cut on the boundaries of" This is generally called 'exclude on edges' in image processing.
Line 323 – Is this one-sentence paragraph?
Line 327 – How many rounds of dilation were applied?
So the furrows were identified with image analysis and not the plant height, as mentioned in Sec. 2.1.2? If this is the approach, then it should be shown in Methods.
11-13 should be in methods as these are the preprocessing steps applied in your analysis. That is how most of the scientific papers are shown.
The captions of all the figures have to be descriptive, and it should be standalone. For eg., 'Fig. 11. Original image' is the minimum title. Mention something about flight height, resolution, area covered in that image, etc. Apply similar logic to all figure captions.
Line 335 – "marks of all center agave plants"? Not clear. Please rephrase.
16. It is a single word caption and not intuitive. Also, I feel the heat map is not the correct term for this.
Until line 345 – These are just demonstrating the methods and cannot be considered as results from this study. Please merge these in methods and move the figures to methods.
Lines 341-345 – These are some results that can be discussed.
Lines 347-359 – These are methods. Please move this.
Line 361-363 – These are validation results. What is the ground truth data you used? Haar classifier is for comparison and cannot be considered as ground truth.
Line 368 – This sentence does not have any relevance to the current work discussed so far.

Reviewer 4 Report

This paper presents a method to detect and count agave plants. This topic is interesting and useful for many potential users. However, there is much room for improvement, then this paper is not acceptable for the publication. This kind of study has been done widely such as in remote sensing fields. This method is combined with classical image processing technique and simple CNN. Now that powerful networks such as YOLO and SSD are widely used and this proposed method should be compared with those models. This paper just compared with Haar classifier while much more powerful algorithms including deep learning and not-deep learning techniques are known. Also, the parameters in your method is not specified and just the overview of the method is written, which made impossible to repeat your method. Since the we do not know the detail of your method, we cannot even evaluate how effective your method is. Please find my comments and I hope you brush up this manuscript in the future. Further, the discussion is too short and not ready for acceptance for the publication.

Your study is interesting and can contribute to the monitoring in agriculture including other plants. This study is very important and I hope this is applied in the field in the near future. However, as mentioned below, this paper lucks the detail of your method and render it difficult to repeat and follow your study if the potential reader likes this study. Also, the discussion is too short and many things to be written is not present. Therefore, I think this paper cannot be accepted so far. And I hope you take your time and improve your work more to attract much more potential readers.

Introduction

What is the problem in the prior studies? As many efforts have been made for the detection of plants, I did not understand why this study is important.
First of all, do you really need to conduct object detection? If you would like to monitor the growth or doing change detection, making correlation with the total number of plants or classical change detection technique is suffice for your objective. This paper just detect and counts the plants, not making use of the detection result. The object detection is not the best solution for your objective.

Method

You divided your data into training, validation and test data. However, this test data is made from the same image as training data. This kind of validation is, of course, informative for the first step, but it would be much nicer when you use the data from different place. For example, you prepare two large orthomosaic images. Fist image is used for the training and validation, then perform final test using the second orthomosaic.
Color segmentation: you used only one image for the training and test, therefore, the color segmentation was stable. However, this kind of segmentation becomes unstable when the light condition and weather are different from the ones in training images. Color information is not robust because the information is different from time to time even at the same place. What do you think about this problem?
Drone Deploy 3D Mapping: how about specifying the information about this company? Company name, the headquarter place …
How about describing the parameters or setting you used for creating orthomosaics? The orthomosaic you created looks so nice and it will be informative when you specify how to create it.
The CNN network: you built a CNN network from scratch. However, in my point of view, it is more common to perform fine-tuning/transfer learning using a backbone network such as ResNet. How about resizing the images and use the pre-trained models? The accuracy of the binary classification will be increased.
Detection method: in your current work, the accuracy might be better when you use Yolo v3, SSD and other CNN-based object detection algorithm. I am not sure why you do not compare the accuracy with the state-of-the-art algorithm. We cannot recognize how your method is superior without the result of such new algorithms.
Indeed, many papers have been reported with the CNN-based object detection algorithm in remote sensing. You should cite papers about them in introduction.
Your strategy like a “sliding-window” is used, for example, in R-CNN, fast/faster R-CNN. How about using the algorithm?
The parameters in training with CNN and color segmentation and other processes are not sufficiently written, which made difficult to repeat your study. What is the threshold in color segmentation? What is the optimizer and initial learning rate? No information about them.
Center detection: what did you do if several center points were detected for single plant? Did you perform some post treatment such as NMS?
How many images were used to create orthomosaic image? Since the quality of the orthoimage is very influential for the detection accuracy. It should be addressed in detail.

Result

You compared only with “Haar classifier”. I think it is essential to compare with new CNN-based algorithm such as Yolo and SSD for the publication.
Haar classifier was performed to compare with the proposed method. You must write about Haar classifier such as the parameters and other details to repeat your study.
The definition of the metrics is mentioned in the result part which should be in the method part.
This method consists of two parts, right? The first step is segmentation into single plant. The second is to classify the target or not. How is the performance of each step?
How about posting the example of positive and negative image for the CNN classification? I image that the positive and negative images look so different, meaning CNN might not be required. If the different is obvious, I think other image processing techniques which do not require training process is much better. What are your thoughts?

Discussion

This section is too short. This is one of the most important sections for the publication.
When the omission of detection happens? F1 is 96 %. What is the 4 % of the rest of it like?
How about performing your detector to other study site? The F1 score is high, but this detector might be applicable only for the orthomosaic image used in this study.
You mentioned that “With the created dataset, the training stage in the proposed architecture achieved 95% of accuracy in training and 93% on the validation process.” Why the F1 score is higher than 93 % in Table 1?
Why the method with Haar classifier is not successful compared with your method? What made your method much powerful compared with the prior methods? This kind of discussion is necessary in the discussion part.

Conclusions

I think the conclusion part should be written more in detail. How about reading the conclusion part in the journal papers in your target journal?

Others

Did you get the native speakers check this manuscript? If so, how about submitting the certificate of the proof? I heard that previous reviewer also talked about the grammar in this paper.

Article Menu

Automated Agave Detection and Counting Using a Convolutional Neural Network and Unmanned Aerial Systems

Further Information

Guidelines

MDPI Initiatives

Follow MDPI