YOLOv5s-T: A Lightweight Small Object Detection Method for Wheat Spikelet Counting

Shi, Lei; Sun, Jiayue; Dang, Yuanbo; Zhang, Shaoqi; Sun, Xiaoyun; Xi, Lei; Wang, Jian

doi:10.3390/agriculture13040872

Open AccessArticle

YOLOv5s-T: A Lightweight Small Object Detection Method for Wheat Spikelet Counting

by

Lei Shi

^1,2

,

Jiayue Sun

¹,

Yuanbo Dang

¹,

Shaoqi Zhang

¹,

Xiaoyun Sun

¹,

Lei Xi

^1,2 and

Jian Wang

^1,*

¹

College of Information and Management Science, Henan Agricultural University, Zhengzhou 450046, China

²

Henan Grain Crop Collaborative Innovation Center, Zhengzhou 450046, China

^*

Author to whom correspondence should be addressed.

Agriculture 2023, 13(4), 872; https://doi.org/10.3390/agriculture13040872

Submission received: 7 March 2023 / Revised: 8 April 2023 / Accepted: 12 April 2023 / Published: 15 April 2023

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Utilizing image data for yield estimation is a key topic in modern agriculture. This paper addresses the difficulty of counting wheat spikelets using images, to improve yield estimation in wheat fields. A wheat spikelet image dataset was constructed with images obtained by a smartphone, including wheat ears in the flowering, filling, and mature stages of reproduction. Furthermore, a modified lightweight object detection method, YOLOv5s-T, was incorporated. The experimental results show that the coefficient of determination (

R^{2}

) between the predicted and true values of wheat spikelets was 0.97 for the flowering stage, 0.85 for the grain filling stage, and 0.78 for the mature stage. The

R^{2}

in all three fertility stages was 0.87, and the root mean square error (RMSE) was 0.70. Compared with the original YOLOv5s algorithm, the spikelet detection counting effect of YOLOv5s-T was not reduced. Meanwhile, the model size was reduced by 36.8% (only 9.1 M), the GPU memory usage during the training process was reduced by 0.82 GB, the inference time was reduced by 2.3 ms, the processing time was reduced by 10 ms, and the calculation amount was also reduced. The proposed YOLOv5s-T algorithm significantly reduces the model size and hardware resource requirements while guaranteeing high detection and counting accuracy, which indicates the potential for wheat spikelet counting in highly responsive wheat yield estimation.

Keywords:

yield estimation; lightweight object detection; wheat spikelet counting

1. Introduction

Real-time, accurate crop yield estimation has a significant impact on food security, social economy and agricultural productivity [1,2,3,4]. As one of the three major food crops in the world [5], the estimation of wheat yield is an important focus of agricultural production. In general, wheat yield estimation can be conducted with three indicators, including spike number per unit area, grain number per spike and thousand-grain weight [6,7]. Unlike real-time wheat yield monitoring methods using remote sensing data [4,8], which are mostly based on spike number per unit area, wheat spikelet counting is based on the grain number per spike and is expected to be more accurate for wheat yield estimation because it reflects the physical conditions of wheat fields at a finer granularity. In fact, spikelet counting has similarly been used for rice, another important crop that is similar to wheat [9]. Therefore, adopting an applicable method to count wheat spikelets in the field is of great research importance.

Traditionally, wheat spikelet counting is mainly performed using artificial statistics, which results in a heavy workload and low efficiency. Thus, it is not applicable to large-scale implementation. With the development of computer and network technology, utilizing image data to assist agricultural production has become a growing trend. Due to the large and comprehensive information that images can contain, they have been widely used in pest and disease identification [10,11,12], crop counting [13,14,15], weed identification [16,17], phenotype analysis [18,19], etc. Therefore, adopting image data for wheat spikelet counting is of significant research potential.

After the color image captures the phenotypic characteristics of crops [20], digital image processing methods are used to obtain the color, texture, shape, and other features. In [21], Zhou et al. applied X-ray computed tomography imaging and CT imaging to measure and count wheat ear phenotypic characteristics. Because the required instruments are expensive and the imaging systems are complex, this method is limited to use in laboratory environments. In general, the above image processing techniques often require manual intervention in image processing, which reduces the efficiency and effectiveness of counting. Deep learning methods provide a promising solution to address these problems. In [22], Yang et al. introduced a CNN classification model to estimate corn grain yield. High-resolution hyperspectral imagery captured by a UAV was used to train such a model. To count the number of wheat ears accurately, Pound et al. [23] proposed a stacked hourglass network model to enhance the ability of a multitask deep-structure network. An ACID dataset was constructed adopting wheat ear images with an all-black background. In [24], Fernandez-Gallego et al. reached a counting accuracy of 90% by using contrast enhancement and filtering technology. However, the works [23,24] required extensive effort to preprocess images, and such preprocessed images do not fully reflect physical field conditions. Then, in [25], Liu et al. used a deep convolutional neural network to estimate the density and count the number of wheat ears in the field. The correlation coefficient

R^{2}

between the estimate and the true value reached 0.9.

Although counting wheat ears has been adequately investigated, studies on counting wheat spikelets remain insufficient. As stated above, compared to wheat ears, the number of wheat spikelets is more accurate for wheat yield estimation. Recently, some studies have considered wheat spikelet counting [26,27]. However, these studies did not consider wheat spikelet counting in the field, nor did they consider the real-time counting availability. With the development of the Internet of Things (IoT), multi-access edge computing, and artificial intelligence, smart agriculture can achieve real-time wheat spikelet counting [28]. Without loss of generality, the implementation of smart agriculture requires a large number of IoT intelligent devices, which are constrained by limited computing capacity, storage space, and self-sufficient energy [29,30,31]. This inspired the use of a lightweight method for counting wheat spikelets that maintains a tolerable counting performance. Here, a wheat spikelet image dataset was constructed, which contained images of wheat ears in the flowering, filling, and mature stages of three wheat reproductive periods. All the images had a real field background. YOLOv5s-T, a modified lightweight object detection method, was incorporated. The results demonstrated that the YOLOv5s-T algorithm can significantly reduce the model size and hardware resource requirements while guaranteeing a high detection and counting accuracy. This indicates the potential of YOLOv5s-T for wheat spikelet counting in highly responsive wheat yield estimation.

2. Materials and Methods

2.1. Dataset Construction

The wheat ear images used in this study were collected from the experimental field (113°8

^{'}

E, 34°13

^{'}

N) at the Xuchang Campus of Henan Agricultural University, which covers an area of 3300 m². The sampling time was from 8:00 a.m. to 10:00 a.m. from 22 April 2021 to 28 May 2021. The wheat ears were randomly selected from the field for sampling. The images obtained cover the flowering, grain filling, and maturity stages with an approximate ratio of 1:2:2. The wheat varieties collected were AK58, Xinong, Yumai49, and Zhoumai27. Sideview images of the wheat ears were taken with an image resolution of 3024 × 4032 pixels in JPG format. A couple of typical samples of the dataset are shown in Figure 1.

After filtering and sorting the collected wheat spikelet images, 1750 images were finally obtained. Since there are some morphological differences between the spikelets on the head of the wheat ears and those on the main ear body, the spikelets were divided into two categories: spikes and spikelets. The labelImg https://github.com/tzutalin/labelImg (accessed on 15 January 2023) tool was used for image labeling, and a txt label file in YOLO format was generated. An example of data labeling is shown in Figure 2. The wheat spikelet dataset was randomly divided into a training set, a validation set, and a test set at a ratio of 8:1:1, and 1398 images were used for the training set, 176 images for the validation set, and 176 images for the test set.

2.2. YOLOv5s-T Network Structure

YOLOv5 was proposed by Ultralytics LLC in May 2020 [32]. It is a one-stage target detection algorithm improved based on YOLOv4 [33]. Depending on the model depth and width, the model can be divided into four versions: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. The YOLOv5s model is the smallest model.

Normally, the model extracts the objected features in images by using the convolutional operation, which requires a large number of computational resources. Therefore, based on the neural network pruning idea, this study directly reduced the number of convolution operations. Using smaller convolution kernels in spatial pyramid pooling (SPP) structures reduces the number of output convolutional channels of some convolutional units in the original backbone. This study tried to reduce the size of the model and reduce the usage of computing resources as much as possible and aimed to reduce the cost of neural network training and deployment with a reliable detection performance. YOLOv5s-T is based on the YOLOv5s model; thus, the network structure consists of input, backbone, neck, and prediction. The specific network structure is shown in Figure 3.

Input: The input includes three parts: mosaic data enhancement, adaptive anchor frame calculation, and adaptive image scaling. Mosaic data enhancement performs stitching operations on images through random scaling, cropping and alignment, stitching four images into a new image, increasing the data complexity and enriching the dataset content. Some of the mosaic data-enhanced wheat images are shown in Figure 4. The adaptive anchor frame calculation can obtain the appropriate optimal values for different datasets. Adaptive image scaling can uniformly scale the original image to the standard input size required by the network model.

Backbone: The backbone consists of four parts: the C3 structure, the Conv module, the SPP structure, and the focus structure. The C3 structure enhances the learning ability of convolutional neural networks and is simplified from the cross stage partial net (CSPNet) [34] structure. The number of repetitions of the C3 module is reduced to half of that of the original number. The Conv module is used for sampling and feature splicing. In this study, the number of output convolutional channels before the SPP structure was decreased by 256 compared with that of the original number. The SPP structure aims to use local features and global features. The main task of the focus structure was to split the adjacent pixels of the original image. The original three-channel input was expanded four times to twelve channels, and a twofold downsampling feature map was obtained through a convolution operation. The specific slicing operation is shown in Figure 5.

Neck: The system’s neck is a combination of a feature pyramid network (FPN) and a pixel aggregation network (PAN). The FPN transmits the feature information of the upper layer to the lower layer for fusion through top-down upsampling. PAN splices the strong positioning features from the lower layers to the upper layers through bottom-up downsampling. This design can effectively strengthen the integration capability of network features. The specific structure is shown in Figure 6.

Prediction: The output is the statistical prediction, which consists of two major parts: the loss and non-maximum suppression (NMS) functions. The loss function is used to evaluate the extent to which the predicted value of the model is different from the true value and usually affects the performance of the model. In YOLOv5, the generalized intersection over union (GIoU) loss function [35] is used as the loss function of the boundary anchor frame. NMS is used to remove highly redundant prediction boxes.

2.3. YOLOv5s-T Loss Function

The intersection over union (

I o U

) is the intersection ratio between the prediction box and the real box. However, if the predicted frame does not intersect with the real frame, it is unable to accurately reflect the distance of the two boxes.

G I o U

solves this problem by calculating the ratio between the non-overlapping area and the minimum outer frame. Therefore, compared with the IoU loss function, the

G I o U

loss function can better reflect the distance between the real box and the prediction box, and the calculation is shown in (1):

G I o U_{L o s s} = 1 - I o U + \frac{∣ A^{c} - u ∣}{∣ A^{c} ∣},

(1)

where

A^{c}

is the area of the smallest external rectangle between the prediction frame and the real frame, and u is the union of the prediction frame and the real frame.

However, in the case of a box containing another box, the value of the

G I o U

loss function is almost the same as that of the

I o U

loss function and cannot judge the relative position of two boxes. For this reason, this study adopted the efficient-IOU (

E I o U

) bounding box loss function [36] and compared it with the complete-IoU (

C I o U

) loss function [37].

E I o U

is the sum of three parts,

L_{I o U}

,

L_{d i s}

, and

L_{a s p}

, which represent the overlap loss, center point distance loss, height, and width loss of the predicted frame and the real frame, respectively. This calculation is shown in Formula (2):

\begin{matrix} L_{E I o U} (b, b_{g t}) & = L_{I o U} + L_{d i s} + L_{a s p} \\ = 1 - I o U (b, b^{g t}) + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + \frac{ρ^{2} (w, w^{g t})}{C_{w}^{2}} + \frac{ρ^{2} (h, h^{g t})}{C_{h}^{2}}, \end{matrix}

(2)

where b and

b^{g t}

represent the central point of the prediction box and the real box, respectively,

ρ^{2}

represents the Euclidean distance between center points, c represents the shortest diagonal length of the minimum surrounding box of the prediction box and the real box,

w^{g t}

,

h^{g t}

, w, and h represent the width and height of the real box and the forecast box, respectively, and

C_{w}

and

C_{h}

represent the width and height of the minimum external box covering the real box of the prediction box, respectively.

2.4. Measure Metrics

To measure the performance of the wheat spikelet detection algorithm, precision (P), recall (R),

F 1

scores, average precision (

A P

), mean average precision (

m A P

), and the average detection time of a single picture were used as evaluation indexes [38]. The precision, recall and

F 1

calculation formulas are shown in (3)–(5).

P = \frac{T P}{T P + F P},

(3)

R = \frac{T P}{T P + F N},

(4)

F 1 = \frac{2 \times P \times R}{P + R} .

(5)

Considering the spikelets detected as an example,

T P

represents the number of spikelets in the category that were detected correctly,

F N

represents the number of spikelets in the category that were detected incorrectly, and

F P

represents the number of non-spikelets in the category that were detected incorrectly. The

A P

and

m A P

calculation formulas are shown in (6) and (7):

A P = \int_{0}^{1} P (R) d R,

(6)

m A P = \frac{\sum_{j = 1}^{c} {(A P)}_{j}}{c},

(7)

where

P (R)

is the accuracy rate and recall rate curves,

A P

is the area under the P-R curve,

m A P

is obtained by calculating the average

A P

values for all categories, and c represents the total number of categories.

Additionally, to detect and locate spikelets, it is necessary to analyze the counting performance of the final counting results to measure the accuracy and applicability of the algorithm for spikelet counting. The determination coefficient

R^{2}

, mean absolute error (

M A E

), and root mean squared error (

R M S E

) are selected as the evaluation indices for algorithm counting. These indices are defined as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(t_{i} - p_{i})}^{2}}{\sum_{i = 1}^{n} {(t_{i} - \bar{t_{i}})}^{2}},

(8)

M A E = \frac{1}{n} \sum_{i = 1}^{n} ∣ t_{i} - p_{i} ∣,

(9)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(t_{i} - p_{i})}^{2}}{n}},

(10)

where n is the number of pictures in the test set involved in the assessment, and

t_{i}

and

p_{i}

represent the number of spikelets counted manually and the number of spikelets counted by the algorithm in image i, respectively.

\bar{t_{i}}

represents the average number of spikelets per picture.

3. Results

3.1. Experimental Environment and Parameter Configuration

The computational platform in this study was configured as follows: a desktop with an Intel(R) Core(TM) i7-12700F CPU @ 2.10 GHz, NVIDIA GeForce RTX 3060 Ti graphics card, 16 GB RAM, and 8 GB video memory. The experiments were performed in the PyTorch1.13 deep learning framework with CUDA version 11.7 and Python version 3.9 running on PyCharm. The batch size was set to 16, and the training round epoch was set to 300.

3.2. Model Training and Performance Analysis

The model was trained by the spikelet dataset proposed in Section 2.1 under the same computational platform and parameter configuration. In Figure 7, the convergence process of the training is first investigated. In general, YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU), as well as the model mAP, performed quite similarly in terms of the training convergence process. In Figure 7b, the training loss gradually converges to less than 0.3 after tens of epochs and flattens after reaching 50 epochs. Correspondingly, the change in the mean average precision with training is shown in Figure 7a. When the training converges, the mean average precision is stable at approximately 98%. This indicates that, after sufficient training, the proposed model in this paper can achieve a promising detection performance.

Furthermore, in Table 1 and Table 2, the comparisons of the well-trained model for YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU) are shown with respect to the original YOLOv5s. As seen in Table 1, among the investigated performance metrics, in terms of precision and recall, mAP, and F1, the original YOLOv5s exceeds the performance of YOLOv5s-T, whereas the latter algorithm is significantly better in terms of processing time per image. YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU) are very similar in terms of the properties in Table 1, which have not been heavily reduced compared to the original YOLOv5s. The long processing time of the original YOLOv5s is mainly due to the redundant convolution operations and the large model volume, which was made lightweight in Section 2. In Table 2, the training metrics of the different models are compared, and a significant reduction in the model size is observed.

Table 1 and Table 2 show that the mAPs of YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU) are 97.70% and 97.43%, respectively, with no significant decrease in detection accuracy. Both YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU) achieve a 25% decrease in detection time compared to YOLOv5s. In addition, the model size and number of parameters are significantly reduced, the model inference time is shortened by 2.3 ms, and the GPU memory usage during training is reduced by 0.82 GB, and the model computation at the 640 × 640 image input size is reduced by 6.1 GFLOPs. It is shown that the YOLOv5s-T model requires fewer hardware resources during the training process, has a better ability to deploy low computing power device platforms, and can provide better support for wheat spikelet counting in a large field environment. Generally, YOLOv5s-T(EIoU) is the same as YOLOv5s-T(CIoU) in terms of the performance of these metrics because they work with the same model architecture. Then, the images of wheat ears at different stages of fertility were selected from the test set to measure the effect on the detection of wheat spikelets, with the test results shown in Figure 8.

3.3. Comparison of Spikelet Count Results on the Test Set

To evaluate the practical effectiveness of the model for wheat spikelet counting, this section compares the model’s detection results to a manual counting of the test set data.

In Figure 9, a linear fit was performed on the predicted and true values of YOLOv5s. The

R^{2}

was 0.89. Then, this was used as a criterion for the evaluation of YOLOv5s-T. In Figure 10, the same statistical approach was used for YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU). Furthermore, the specific performances of YOLOv5s, YOLOv5s-T(CIoU), and YOLOv5s-T(EIoU) with respect to parameters

R^{2}

, MAE, and RMSE are shown in Table 3. As seen in Figure 10 and Table 3, the YOLOv5s-T(EIoU) algorithm is closer to YOLOv5s and better than YOLOv5s-T(CIoU) among these metrics. In general, both YOLOv5s and YOLOv5s-T achieved a favorable performance in terms of MAE and RMSE, which indicates that the proposed method of this study can effectively achieve low-error wheat spikelet counting.

3.4. Results of Spikelet Counts on Different Fertility Stages

To further analyze the effect on different fertility stages of the proposed wheat spikelet counting method, in this study, the images in the test set were divided according to the fertility stages. Specifically, there were 36 images of the flowering stage, 75 images of the filling stage, and 65 images of the maturity stage. The linear regression method fit the spikelet counting values with manual statistics and model predictions from a single wheat image of each fertility stage. The fitting results are shown in Figure 11, and the performance metric results are shown in Table 4.

From Figure 11, the spikelet counts on the images of the flowering stage show the best fit among all three fertility stages, where the

R^{2}

of the YOLOv5s-T(EIoU) algorithm reaches 0.97 and the

R^{2}

of YOLOv5s-T(CIoU) is 0.96. In contrast, the performance at the filling and maturity stages is relatively low, with the counts at the maturity stage showing the worst performance. This suggests that wheat spikelet counting is most effective at the flowering stage, when adhesion is the lowest. As shown in Table 4, the

R^{2}

of YOLOv5s-T(EIoU) was higher than that of YOLOv5s-T(CIoU) in terms of counting performance for all three fertility images. The MAE and RMSE of YOLOv5s-T(EIoU) are slightly higher than those of YOLOv5s-T(CIoU) in the grain filling stage and contrary in the other two stages. Overall, the performance of YOLOv5s-T(EIoU) is more stable, mainly because YOLOv5s-T(EIoU) is less susceptible to abnormal samples.

4. Discussion

4.1. Deep Learning for Wheat Spikelet Counting

The experimental results demonstrate that the proposed YOLOv5s-T achieved a desirable performance both in mAP and lightweight wheat spikelet counting. It is established that mAP and

R^{2}

are the primary factors that have been valued in the past for wheat spikelet counting, or more generally, for small target detection. Many works have accordingly focused on this target [27,38,39,40]. In [27], Qiu et al. proposed a model based on Faster RCNN. The

R^{2}

values for the four wheat varieties between the true number and estimated number are 0.67, 0.73, 0.78, and 0.84. In [39], Hasan et al. also proposed a model based on R-CNN and produced a high

R^{2}

of 0.93. In [38], Yang et al. combined CBAM with YOLOv4 and proposed a CBAM-YOLOv4 model to enhance the feature extraction capability of the network. The mean average precision for wheat ear detection reached 96.40%, and the

R^{2}

for the true-predicted value was 0.955. In reference [40], Misra et al. proposed a deep learning approach utilizing an encoder-decoder network. This model achieved an average detection precision of 99.93%.

These works demonstrate efforts to improve model accuracy and detection precision toward small target detection. Additionally, it reveals the potential of deep learning for wheat spikelet counting. However, as mentioned previously in this paper, encouraged by emerging computing and network technologies such as multi-access edge computing, smart agriculture requires a wheat spikelet counting model that can be deployed rapidly for large-scale field scenarios. Consider a grid-based smart agriculture setup containing many cells. In each cell, intelligent terminals such as field self-propelled robots with image processing as well as learning and training capabilities need to be deployed. In this system, large size models will likely exceed the mobile computing capacity, storage space, and energy consumption. Even without considering the learning capabilities of these smart terminals, the equipment may still need to be deployed with multiple different algorithmic schemes due to the variability in wheat growth stages. Proper handling consistently requires a lightweight model to fit this smart agriculture scenario. Moreover, real-time yield estimation demands these terminals to constantly capture images and perform operations. A lightweight model helps to improve the efficiency of the operations, which can be seen in the experiments of this study, where the lightweight YOLOv5s-T reduces the processing time by approximately 25% compared to YOLOv5s. This is a considerable improvement for real-time processing. Therefore, ensuring detection accuracy while being able to lighten the deep learning model for adapting to real-time detection scenarios was the main motivation of this study. The experimental results show that the

R^{2}

between the predicted and true values of wheat spikelets in all three fertility stages is 0.87, and the root mean square error (RMSE) is 0.70. Additionally, the model size was reduced by 36.8%, and the GPU memory usage during the training process was reduced by 0.82 GB, and the processing time was reduced by 25% compared with the original YOLOv5s model.

4.2. The Influence Factors and the Future

This study focused on counting the number of spikelets per spike. Therefore, during the dataset construction, other wheat ears were separated from the photographed wheat ear as much as possible. However, some spikelets in the background were still detected, as shown in Figure 12a, which caused interference in the counting results and increased the statistical error. In addition, we found that spikelet shape is another important influencing factor. The spikelets at the flowering stage were relatively closely arranged and less shaded, although they were heavily adhered. Hence, the counting accuracy in this period is higher than that in the filling and maturity stages. The spikelets at maturity changed their morphology due to the expansion of the seeds and further intensified the tightness of the spikelet arrangement, as shown in Figure 12b. This resulted in a gradual decrease in counting accuracy from flowering to filling and then to maturity.

In future work, wheat images of more varieties and periods will be added to enrich the dataset and minimize the error detection rate of the model for background spikelets. In addition, spikelet degradation is very common in cereals and is a major constraint on grain production [41]. Therefore, finer classification of spikelets will be performed to distinguish normal spikelets from degenerate spikelets to further improve the accuracy of detection.

5. Conclusions

In this paper, wheat spike datasets were analyzed, including wheat ear images of flowering, filling, and mature wheat in the field as captured by smartphones. A lightweight YOLOv5s-T algorithm was realized by reducing the convolution operation in the YOLOv5s backbone network. Compared with the original YOLOv5s algorithm, the spikelet detection and counting effect of the YOLOv5s-T algorithm on the test set did not significantly decrease, and the computational power demand of hardware equipment was significantly decreased using the YOLOv5s-T algorithm, which thus offers a broader application space and can address artificial statistic applications of analyzing wheat spikelets toward the prediction of agricultural wheat yield.

The proposed YOLOv5s-T algorithm achieved spikelet detection and counting in wheat ear images of the three growth stages, but there were still some deficiencies in the spikelet counting results on images in the filling stage and the mature stage. In the future, we will further improve the model, enrich the spikelet dataset, improve the spikelet detection and counting effect in the images of each growth stage, and realize the application in wheat yield estimation.

Author Contributions

Conceptualization, L.S., J.S. and J.W.; methodology, J.S. and Y.D.; software, J.S. and S.Z.; validation, J.S., L.S. and Y.D.; formal analysis, J.S., X.S. and L.X.; investigation, J.S. and L.X.; resources, X.S.; data curation, L.S., J.S. and Y.D.; writing—original draft preparation, J.S. and Y.D.; writing—review and editing, L.S., J.S. and S.Z.; visualization, J.S.; supervision, L.X.; project administration, L.S.; funding acquisition, L.S. and J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (NO. 31501225) and the Natural Science Foundation of Henan Province of China (NO. 222300420463). Joint Fund of Science and Technology Research and Development Plan of Henan Province (NO. 222301420113).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Acknowledgments

The authors are thankful to Xinming Ma for his strong support for this work. The authors would like to thank the editor and anonymous reviewers for their helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chlingaryan, A.; Sukkarieh, S.; Whelan, B. Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Comput. Electr. Agric. 2018, 151, 61–69. [Google Scholar] [CrossRef]
Sakamoto, T.; Gitelson, A.A.; Arkebauer, T.J. Near real-time prediction of U.S. corn yields based on time-series MODIS data. Remote Sens. Environ. 2014, 147, 219–231. [Google Scholar] [CrossRef]
Kowalik, W.; Dabrowska-Zielinska, K.; Meroni, M.; Raczka, T.U.; de Wit, A. Yield estimation using SPOT-VEGETATION products: A case study of wheat in European countries. Int. J. Appl. Earth Obs. Geoinf. 2014, 32, 228–239. [Google Scholar] [CrossRef]
Kuwata, K.; Shibasaki, R. Estimating crop yields with deep learning and remotely sensed data. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 858–861. [Google Scholar] [CrossRef]
Palazzini, J.; Fumero, V.; Yerkovich, N.; Barros, G.; Cuniberti, M.; Chulze, S. Correlation between Fusarium graminearum and Deoxynivalenol during the 2012/13 Wheat Fusarium Head Blight Outbreak in Argentina. Cereal Res. Commun. 2015, 43, 627–637. [Google Scholar] [CrossRef] [Green Version]
Vahamidis, P.; Karamanos, A.J.; Economou, G. Grain number determination in durum wheat as affected by drought stress: An analysis at spike and spikelet level. Ann. Appl. Biol. 2019, 174, 190–208. [Google Scholar] [CrossRef]
Matsuyama, H.; Ookawa, T. The effects of seeding rate on yield, lodging resistance and culm strength in wheat. Plant Prod. Sci. 2020, 23, 322–332. [Google Scholar] [CrossRef] [Green Version]
Bolton, D.K.; Friedl, M.A. Forecasting crop yield using remotely sensed vegetation indices and crop phenology metrics. Agric. For. Meteorol. 2013, 173, 74–84. [Google Scholar] [CrossRef]
Zhao, S.; Gu, J.; Zhao, Y.; Hassan, M.; Li, Y.; Ding, W. A method for estimating spikelet number per panicle: Integrating image analysis and a 5-point calibration model. Sci. Rep. 2015, 5, 16241. [Google Scholar] [CrossRef] [Green Version]
Lin, T.L.; Chang, H.Y.; Chen, K.H. The Pest and Disease Identification in the Growth of Sweet Peppers Using Faster R-CNN and Mask R-CNN. J. Internet Technol. 2020, 21, 605–614. [Google Scholar] [CrossRef]
Ahmad Loti, N.N.; Mohd Noor, M.R.; Chang, S.W. Integrated analysis of machine learning and deep learning in chili pest and disease identification. J. Sci. Food Agric. 2021, 101, 3582–3594. [Google Scholar] [CrossRef]
Manavalan, R. Automatic identification of diseases in grains crops through computational approaches: A review. Comput. Electr. Agric. 2020, 178, 105802. [Google Scholar] [CrossRef]
Fernandez-Gallego, J.A.; Lootens, P.; Borra-Serrano, I.; Derycke, V.; Haesaert, G.; Roldan-Ruiz, I.; Araus, J.L.; Kefauver, S.C. Automatic wheat ear counting using machine learning based on RGB UAV imagery. Plant J. 2020, 103, 1603–1613. [Google Scholar] [CrossRef]
Hassan, S.I.; Alam, M.M.; Zia, M.Y.I.; Rashid, M.; Illahi, U.; Su’ud, M.M. Rice Crop Counting Using Aerial Imagery and GIS for the Assessment of Soil Health to Increase Crop Yield. Sensors 2022, 22, 8567. [Google Scholar] [CrossRef]
Zhou, C.; Ye, H.; Hu, J.; Shi, X.; Hua, S.; Yue, J.; Xu, Z.; Yang, G. Automated Counting of Rice Panicle by Applying Deep Learning Model to Images from Unmanned Aerial Vehicle Platform. Sensors 2019, 19, 3106. [Google Scholar] [CrossRef] [Green Version]
Jin, X.; Che, J.; Chen, Y. Weed Identification Using Deep Learning and Image Processing in Vegetable Plantation. IEEE Access 2021, 9, 10940–10950. [Google Scholar] [CrossRef]
Tang, J.; Wang, D.; Zhang, Z.; He, L.; Xin, J.; Xu, Y. Weed identification based on K-means feature learning combined with convolutional neural network. Comput. Electr. Agric. 2017, 135, 63–70. [Google Scholar] [CrossRef]
Wang, H.; Cimen, E.; Singh, N.; Buckler, E. Deep learning for plant genomics and crop improvement. Curr. Opinion Plant Biol. 2020, 54, 34–41. [Google Scholar] [CrossRef]
Kolhar, S.; Jagtap, J. Phenomics for Komatsuna plant growth tracking using deep learning approach. Exp. Syst. Appl. 2023, 215, 119368. [Google Scholar] [CrossRef]
Qiu, R.; Wei, S.; Zhang, M.; Li, H.; Sun, H.; Liu, G.; Li, M. Sensors for measuring plant phenotyping: A review. Int. J. Agric. Biol. Eng. 2018, 11, 1–17. [Google Scholar] [CrossRef] [Green Version]
Zhou, H.; Riche, A.B.; Hawkesford, M.J.; Whalley, W.R.; Atkinson, B.S.; Sturrock, C.J.; Mooney, S.J. Determination of wheat spike and spikelet architecture and grain traits using X-ray Computed Tomography imaging. Plant Methods 2021, 17, 26. [Google Scholar] [CrossRef]
Yang, W.; Nigon, T.; Hao, Z.; Paiao, G.D.; Fernandez, F.G.; Mulla, D.; Yang, C. Estimation of corn yield based on hyperspectral imagery and convolutional neural network. Comput. Electr. Agric. 2021, 184, 106092. [Google Scholar] [CrossRef]
Pound, M.P.; Atkinson, J.A.; Wells, D.M.; Pridmore, T.P.; French, A.P. Deep Learning for Multi-task Plant Phenotyping. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2055–2063. [Google Scholar] [CrossRef] [Green Version]
Fernandez-Gallego, J.A.; Luisa Buchaillot, M.; Aparicio Gutierrez, N.; Teresa Nieto-Taladriz, M.; Luis Araus, J.; Kefauver, S.C. Automatic Wheat Ear Counting Using Thermal Imagery. Remote Sens. 2019, 11, 751. [Google Scholar] [CrossRef] [Green Version]
Liu, S.; Baret, F.; Andrieu, B.; Burger, P.; Hemmerle, M. Estimation of Wheat Plant Density at Early Stages Using High Resolution Imagery. Front. Plant Sci. 2017, 8, 739. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Khoroshevsky, F.; Khoroshevsky, S.; Bar-Hillel, A. Parts-per-Object Count in Agricultural Images: Solving Phenotyping Problems via a Single Deep Neural Network. Remote Sens. 2021, 13, 2496. [Google Scholar] [CrossRef]
Qiu, R.; He, Y.; Zhang, M. Automatic Detection and Counting of Wheat Spikelet Using Semi-Automatic Labeling and Deep Learning. Front. Plant Sci. 2022, 13, 872555. [Google Scholar] [CrossRef]
Lavanya, G.; Rani, C.; Ganeshkumar, P. An automated low cost IoT based Fertilizer Intimation System for smart agriculture. Sustain. Comput. -Inform. Syst. 2020, 28, 100300. [Google Scholar] [CrossRef]
Wang, Y.; Tao, X.; Zhang, X.; Zhang, P.; Hou, Y.T. Cooperative Task Offloading in Three-Tier Mobile Computing Networks: An ADMM Framework. IEEE Trans. Veh. Technol. 2019, 68, 2763–2776. [Google Scholar] [CrossRef]
Sun, J.; Wang, H.; Feng, G.; Lv, H.; Liu, J.; Gao, Z. TOS-LRPLM: A task value-aware offloading scheme in IoT edge computing system. Cluster Comput. 2023, 26, 319–335. [Google Scholar] [CrossRef]
Abbas, N.; Zhang, Y.; Taherkordi, A.; Skeie, T. Mobile Edge Computing: A Survey. IEEE Internet Things J. 2018, 5, 450–465. [Google Scholar] [CrossRef] [Green Version]
Qi, J.; Liu, X.; Liu, K.; Xu, F.; Guo, H.; Tian, X.; Li, M.; Bao, Z.; Li, Y. An improved YOLOv5 model based on visual attention mechanism: Application to recognition of tomato virus disease. Comput. Electr. Agric. 2022, 194, 106780. [Google Scholar] [CrossRef]
Wu, D.; Lv, S.; Jiang, M.; Song, H. Using channel pruning-based YOLO v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments. Comput. Electr. Agric. 2020, 178, 105742. [Google Scholar] [CrossRef]
Wang, C.; Liao, H.M.; Wu, Y.; Chen, P.; Hsieh, J.; Yeh, I. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2020, Seattle, WA, USA, 14–19 June 2020; Computer Vision Foundation/IEEE: New York, NY, USA, 2020; pp. 1571–1580. [Google Scholar] [CrossRef]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
Wang, X.; Song, J. ICIoU: Improved Loss Based on Complete Intersection Over Union for Bounding Box Regression. IEEE Access 2021, 9, 105686–105695. [Google Scholar] [CrossRef]
Yang, B.; Gao, Z.; Gao, Y.; Zhu, Y. Rapid Detection and Counting of Wheat Ears in the Field Using YOLOv4 with Attention Module. Agronomy 2021, 11, 1202. [Google Scholar] [CrossRef]
Hasan, M.M.; Chopin, J.P.; Laga, H.; Miklavcic, S.J. Detection and analysis of wheat spikes using Convolutional Neural Networks. Plant Methods 2018, 14, 100. [Google Scholar] [CrossRef] [Green Version]
Misra, T.; Arora, A.; Marwaha, S.; Jha, R.R.; Ray, M.; Jain, R.; Rao, A.R.; Varghese, E.; Kumar, S.; Kumar, S.; et al. Web-SpikeSegNet: Deep Learning Framework for Recognition and Counting of Spikes From Visual Images of Wheat Plants. IEEE Aaccess 2021, 9, 76235–76247. [Google Scholar] [CrossRef]
Wang, Z.Q.; Zhang, W.Y.; Yang, J.C. Physiological mechanism underlying spikelet degeneration in rice. J. Integr. Agric. 2018, 17, 1475–1481. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Different stage samples of the captured images.

Figure 2. A labeled sample of the dataset.

Figure 3. YOLOv5s-T network structure.

Figure 4. Training pictures with mosaic data augmentation.

Figure 5. Focus slicing operation.

Figure 6. FPN+PAN structure.

Figure 7. Training loss and mAP of the proposed YOLOv5s-T.

Figure 8. The detection results on different fertility stages by YOLOv5s-T(EIoU).

Figure 9. Statistical correlation between the predicted and true values of YOLOv5s.

Figure 10. Statistical correlation between the predicted and true values of the proposed YOLOv5s-T.

Figure 11. Fitting results of the number of spikelets in images of each growth stage (a)

R^{2}

of YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU) in the flowering stage, (b)

R^{2}

of YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU) in grain filling stage, (c)

R^{2}

of YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU) in the mature stage.

Figure 11. Fitting results of the number of spikelets in images of each growth stage (a)

R^{2}

of YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU) in the flowering stage, (b)

R^{2}

of YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU) in grain filling stage, (c)

R^{2}

of YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU) in the mature stage.

Figure 12. Example of images that affect the accuracy of spikelet counting. (a) Example of spikelets of background detected results, (b) example of spikelets in maturity stage.

Table 1. Comparison of performance metrics among different models.

Model	Precision (%)	Recall (%)	mAP (0.5) (%)	F1 (%)	Processing Time (ms)
YOLOv5s	99.21	97.22	98.52	98.20	40
YOLOv5s-T(CIoU)	98.06	95.76	97.70	96.90	30
YOLOv5s-T(EIoU)	98.89	94.92	97.43	96.87	31

Table 2. Comparison of training metrics among different models.

Model	Input Size	Parameter	Inference Time (ms)	Model Size (M)	GPU Video Memory Usage (G)	FLOPs (G)
YOLOv5s	640 × 640	7.06 × 10 $^{6}$	7.8	14.4	4.14	16.3
YOLOv5s-T(CIoU)	640 × 640	4.48 × 10 $^{6}$	5.6	9.1	3.32	10.2
YOLOv5s-T(EIoU)	640 × 640	4.48 × 10 $^{6}$	5.5	9.1	3.32	10.2

Table 3. Comparison of wheat spikelet counting performance between different models.

Model	$R^{2}$	MAE	RMSE
YOLOv5s	0.89	0.23	0.58
YOLOv5s-T(CIoU)	0.84	0.35	0.74
YOLOv5s-T(EIoU)	0.87	0.34	0.70

Table 4. Comparison of spikelet counting results between YOLOv5s-T(CIoU) and YOLOv5s-T(EIoU) at different growth stages.

Growth Stage	YOLOv5s-T(CIoU)			YOLOv5s-T(EIoU)
Growth Stage	$R^{2}$	MAE	RMSE	$R^{2}$	MAE	RMSE
Flowering stage	0.96	0.17	0.41	0.97	0.11	0.33
Grain filling stage	0.84	0.33	0.70	0.85	0.37	0.73
Mature stage	0.70	0.48	0.90	0.78	0.43	0.80

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, L.; Sun, J.; Dang, Y.; Zhang, S.; Sun, X.; Xi, L.; Wang, J. YOLOv5s-T: A Lightweight Small Object Detection Method for Wheat Spikelet Counting. Agriculture 2023, 13, 872. https://doi.org/10.3390/agriculture13040872

AMA Style

Shi L, Sun J, Dang Y, Zhang S, Sun X, Xi L, Wang J. YOLOv5s-T: A Lightweight Small Object Detection Method for Wheat Spikelet Counting. Agriculture. 2023; 13(4):872. https://doi.org/10.3390/agriculture13040872

Chicago/Turabian Style

Shi, Lei, Jiayue Sun, Yuanbo Dang, Shaoqi Zhang, Xiaoyun Sun, Lei Xi, and Jian Wang. 2023. "YOLOv5s-T: A Lightweight Small Object Detection Method for Wheat Spikelet Counting" Agriculture 13, no. 4: 872. https://doi.org/10.3390/agriculture13040872

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLOv5s-T: A Lightweight Small Object Detection Method for Wheat Spikelet Counting

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Construction

2.2. YOLOv5s-T Network Structure

2.3. YOLOv5s-T Loss Function

2.4. Measure Metrics

3. Results

3.1. Experimental Environment and Parameter Configuration

3.2. Model Training and Performance Analysis

3.3. Comparison of Spikelet Count Results on the Test Set

3.4. Results of Spikelet Counts on Different Fertility Stages

4. Discussion

4.1. Deep Learning for Wheat Spikelet Counting

4.2. The Influence Factors and the Future

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI