Multiscale Maize Tassel Identification Based on Improved RetinaNet Model and UAV Images

Wang, Binbin; Yang, Guijun; Yang, Hao; Gu, Jinan; Xu, Sizhe; Zhao, Dan; Xu, Bo

doi:10.3390/rs15102530

Open AccessArticle

Multiscale Maize Tassel Identification Based on Improved RetinaNet Model and UAV Images

by

Binbin Wang

^1,2

,

Guijun Yang

¹,

Hao Yang

¹

,

Jinan Gu

²,

Sizhe Xu

¹

,

Dan Zhao

¹ and

Bo Xu

^1,3,*

¹

Key Laboratory of Quantitative Remote Sensing in Agriculture of Ministry of Agriculture and Rural Affairs, Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China

²

School of Mechanical Engineering, Jiangsu University, Zhenjiang 212000, China

³

School of Chemistry and Bioengineering, University of Science and Technology Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(10), 2530; https://doi.org/10.3390/rs15102530

Submission received: 14 April 2023 / Revised: 8 May 2023 / Accepted: 9 May 2023 / Published: 11 May 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The acquisition of maize tassel phenotype information plays a vital role in studying maize growth and improving yield. Unfortunately, detecting maize tassels has proven challenging because of the complex field environment, including image resolution, varying sunlight conditions, plant varieties, and planting density. To address this situation, the present study uses unmanned aerial vehicle (UAV) remote sensing technology and a deep learning algorithm to facilitate maize tassel identification and counting. UAVs are used to collect maize tassel images in experimental fields, and RetinaNet serves as the basic model for detecting maize tassels. Small maize tassels are accurately identified by optimizing the feature pyramid structure in the model and introducing attention mechanisms. We also study how mapping differences in image resolution, brightness, plant variety, and planting density affect the RetinaNet model. The results show that the improved RetinaNet model is significantly better at detecting maize tassels than the original RetinaNet model. The average precision in this study is 0.9717, the precision is 0.9802, and the recall rate is 0.9036. Compared with the original model, the improved RetinaNet improves the average precision, precision, and recall rate by 1.84%, 1.57%, and 4.6%, respectively. Compared with mainstream target detection models such as Faster R-CNN, YOLOX, and SSD, the improved RetinaNet model more accurately detects smaller maize tassels. For equal-area images of differing resolution, maize tassel detection becomes progressively worse as the resolution decreases. We also analyze how detection depends on brightness in the various models. With increasing image brightness, the maize tassel detection worsens, especially for small maize tassels. This paper also analyzes the various models for detecting the tassels of five maize varieties. Zhengdan958 tassels prove the easiest to detect, with R² = 0.9708, 0.9759, and 0.9545 on 5, 9, and 20 August 2021, respectively. Finally, we use the various models to detect maize tassels under different planting densities. At 29,985, 44,978, 67,466, and 89,955 plants/hm², the mean absolute errors for detecting Zhengdan958 tassels are 0.18, 0.26, 0.48, and 0.63, respectively. Thus, the detection error increases gradually with increasing planting density. This study thus provides a new method for high-precision identification of maize tassels in farmland and is especially useful for detecting small maize tassels. This technology can be used for high-throughput investigations of maize phenotypic traits.

Keywords:

deep learning; target detection; maize tassels; UAV; image analysis

1. Introduction

Maize is an important food crop, and its yield is essential for food security [1]. Maize is a hermaphroditic crop that is amenable to self-pollination, although self-pollination is not beneficial for the selection of superior seeds. Therefore, it is important to ensure the cross-pollination of maize, which is beneficial for breeding and improves the yield [2,3]. In addition, during maize pollination, the wind blows many pollen grains onto the female panicle. However, the high respiratory capacity of pollen means that these pollen grains consume large amounts of nutrients, thereby competing for nutrients with the female panicle and disfavoring the growth of the latter [4,5,6,7]. At the same time, the long tassel creates more shade, reducing photosynthesis in the maize leaves, which is not conducive to the growth of maize. The overabundance of tassels also hinders pest control and reduces the yield [8,9,10]. Given these effects caused by maize tassels, maize de-masculinization contributes significantly to yield improvement. Yang et al. [11] studied how septate de-etiolated carrying parietal lobes affect photosynthetic characteristics, dry matter accumulation, and yield in both castrated and undesired maize and showed that these lobes increase the area exposed to light in the middle and lower sections of the maize plant. This phenomenon increases the photosynthetic capacity of maize and facilitates the translocation of photosynthetic products to the grain to increase yield. The mainstay of conventional pollination is to pluck the tassel when the maize begins to bolt, relying on artificial septation or straining to leave about half of the male panicles for pollination, which is required for sufficient pollination of female panicles. Once maize pollination is complete, the remaining half of the tassels can be removed to reduce occlusion and improve photosynthesis [12,13,14].

However, the traditional method of removing maize tassels relies on manual identification and is time consuming and laborious. The application of computer vision technology to crop identification thus provides a welcomed technological means to accurately identify maize tassels, leading to efficient scientific guidance in de-androgenesis. Accurately determining the number of maize tassels provides strong support for quickly learning the progress of maize bolting.

Traditional maize tassel recognition is mainly based on color space and machine learning [15,16]. Tang et al. [17] proposed an image segmentation algorithm based on the hue–saturation–intensity color space to extract maize tassels from images and proposed a method to identify maize tassels. The results showed that this method can extract maize tassel parts from images and spatially locate the maize tassels. Ferhat et al. [18] proposed a detection algorithm for eliminating maize tassels by combining traditional color images with machine learning. The algorithm uses color information and support vector machine classifiers to binarize images, performs morphological analysis to determine the possibly pure location of tassels, uses clustering to merge multiple detections of the same tassels, and determines the final location of the maize tassels. The results showed that male tassels could be detected in color images. Mao et al. [19] transformed red–green–blue (RGB) maize field images into hue–saturation–intensity space, binarized the hue component, filtered, denoised, morphologically processed, and generated rectangular box areas, and confirmed the removal of false detections by using a learning vector quantization neural network. The results showed that this method improved the segmentation accuracy of maize tassels.

In recent years, with the development of unmanned aerial remote sensing technology [20,21,22,23] and computer vision technology [24,25], agricultural intelligent monitoring systems have been continuously improved. The advantages of high efficiency, convenience, and low cost have made unmanned aerial vehicles (UAVs) popular with many researchers for collecting agricultural data. With efficient in-depth learning algorithms, computer vision technology is increasingly important in image classification, target detection, and image segmentation [26,27,28,29]. Significant research has been performed on maize tassel recognition based on UAV remote sensing technology and deep learning algorithms. For example, Lu et al. [30] proposed the TasselNet model to count maize tassels by using a local regression network and made the maize tassel dataset public. Liang et al. [31] proposed the SSD MobileNet model, which is deployed on a UAV system and is ideally suited for recognizing maize tassels. Yang et al. [32] improved the target detection model of the anchorless frame CenterNet [33] to efficiently detect maize tassels. Liu et al. [34] used the Faster R-CNN model to detect maize tassels by replacing different main feature extraction networks. They concluded that the residual network (ResNet) [35] works better as a feature extraction network for maize tassels than the visual geometry group network (VGGNet) [36].

Although significant research has focused on detecting maize tassels [37], the complexity of the field environment and the interference involved in detecting maize tassels, including image resolution, brightness, tassel varieties, and planting density, make this a challenging task [30]. In fact, the detection of maize tassels requires a model for small-target detection [38,39,40], which means that detection depends heavily on the target of the detection model. The target detection model based on convolution neural networks may be divided into one-stage and two-stage target detection networks. One-stage target detection algorithms include SSD [41], the YOLO series [42,43,44,45,46], and RetinaNet [47]. Second-stage target detection networks mainly include Faster R-CNN [48].

RetinaNet has many structures in its model that are useful for small-target detection. For example, its anchor frames come in many sizes, ensuring detection accuracy for large and small targets. Moreover, RetinaNet uses a feature pyramid network (FPN) [49], which is very useful for expressing small-target feature information. Therefore, this model has been used to detect numerous small targets. For example, Li et al. [50] accurately detected wheat ears by using the RetinaNet model and found that the algorithm accuracy exceeds that of the Faster R-CNN network. In other work, Chen et al. [51] improved the RetinaNet model to improve the identification accuracy of flies. Therefore, this study uses the RetinaNet model for maize tassel detection and improves the recognition accuracy of smaller tassels by optimizing the model and introducing an attention mechanism. In addition, we analyze how image resolution, brightness, plant varieties, and planting density affect the detection model.

2. Materials and Methods

2.1. Field Experiments

The experimental data were collected at the National Precision Agriculture Research and Demonstration Base in Xiaotangshan Town, Changping District, Beijing, China, at 36 m above sea level. The whole trial area contained 80 plots, with each plot measuring 2.5 m × 3.6 m. This experiment used five maize varieties of different genetic backgrounds: Nongkenuo336 (A1), Jingjiuqingzhu16 (A2), Tianci19 (A3), Zhengdan958 (A4), and Xiangnuo2008 (A5). The agronomic characteristics of each maize variety are given in Table 1. Each maize variety was planted at four planting densities: 29,985, 44,978, 67,466, and 89,955 plants/hm² and repeated twice. The trial was sown on 11 June 2021 and harvested on 11 September 2021.

2.2. Data Acquisition and Preprocessing

We used a DJI Royal Mavic 2 portable UAV to collect images from 80 plots in the trial on 9 August 2021 (13:30–14:00), when all maize had entered the silking stage. Data were collected in cloudy, windless conditions to reduce any effect on the detection of illumination and maize plant oscillation. The UAV was equipped with a 20-megapixel Hasselblad camera and flew at an altitude of 10 m, with 80% forward overlap and 80% lateral overlap, a pixel resolution of 0.2 cm, and an image resolution of 5472 pixels × 3648 pixels for a total of 549 images of maize tassels. The images acquired by the UAV were divided into training and validation sets and a test set, and the trial area was organized as shown in Figure 1. Given the large size of each image, the maize tassels were relatively dense and occupied a small pixel area, making it impossible to train and detect the images directly. To ensure the speed of network training in the later stage, the acquired images were cropped to a size of 600 pixels × 600 pixels.

To analyze how variety and planting density affect tassel detection, we ensured an equal amount of data for training and validation for the different varieties and planting densities. Therefore, for the training and validation sets, 80 images of tassels of each maize variety at each planting density were taken as the training set for a total of 1600 images for the five varieties and four planting densities. For the validation set, 10 images of tassels were acquired for each maize variety and each planting density for a total of 200 images. For the test set, 16 images were acquired of tassels for each maize variety and planting density for a total of 320 images. In addition, we acquired images of the test set area at a height of 5 m. A total of 12 images was acquired to verify the effect of different resolutions on the detection effectiveness of the model.

LabelImg is an open-source image labeling tool that saves the labeled targets as XML files in the PASCAL VOC format used by ImageNet. The XML file saved after labeling each image contains the width and height of the image and the number of channels, as well as the category of the target and the coordinates of the top-left and bottom-right vertices of the target bounding box.

2.3. Model Description

RetinaNet is a typical one-stage object detection algorithm that is widely used in various fields. The RetinaNet model consists mainly of a feature extraction module, an FPN module, a classification module, and a regression module. The feature extraction module mainly uses the residual network. When an image is input, the Conv3_x, Conv4_x, and Conv5_x layers are output by the feature extraction network and then passed to the FPN module for feature fusion. Finally, five feature maps, P3–P7, are output to the classification and regression module for target prediction. In addition, the RetinaNet network uses focal loss as the loss function, which solves the imbalance between positive and negative samples during training. Figure 2 shows the RetinaNet network structure.

(1): Backbone Network

The RetinaNet model uses the Resnet network as the backbone feature extraction network, which consists mainly of residual blocks. Before the residual network was proposed, with the establishment of a deeper network, the network gradient disappeared and degenerated. In other words, the training loss increased upon increasing the number of network layers, and the network effect worsened. Introducing the residual block allowed the network to be built deeper, improving the results. Two main types of residual block structures are BasicBlock and BottleNeck, which are shown in Figure 3.

(2): Feature Pyramid Network

In the RetinaNet model, the FPN enhances the feature extraction capability of the network by combining high-level semantic features with underlying features. The FPN up-samples and stacks the feature maps C3–C5 output from the feature extraction network to produce the feature-fused P3–P5 feature maps. P6 is obtained by convolving C5 once, with a kernel size of 3 × 3 and a step size of two. P7 is obtained by convolving P6 once with the activation function Relu, with a kernel size of 3 × 3 and a step size of two. The smaller size of the upper-level feature map facilitates the detection of larger objects, and the larger size of the lower-level feature map facilitates the detection of smaller objects. The result is five feature maps, P3–P7, where P3–P5 are generally used for small-target detection and P6 and P7 for large-target detection.

(3): Classification and Regression Subnet

The classification and regression subnet is connected to a fully convolutional network at each FPN level. The main role of this network is to predict the probability of each anchor frame and K-class object occurring at each spatial location and to output the object’s location relative to the anchor point when the prediction is a positive sample. The classification subnet uses four 3 × 3 convolution kernels to convolve the feature map, at which point the Relu activation function is used. When the final layer is reached, a 3 × 3 convolution operation is applied, the sigmoid serves as the activation function, and the number of channels in the output image is K × A. The regression subnet has a structure like that of the classification subnet, except that the activation function differs from the number of channels in the final output layer. The last layer of the regression subnet uses linear activation, and the number of channels in the output image is 4A. Figure 4 shows the structure of the classification and regression subnet.

(4): Loss Function

The loss function has two functions: containing classification loss and regression loss. Classification loss uses focal loss as the loss function, which calculates the classification loss for all positive and negative samples. The bounding box regression loss uses Smooth L1 loss as the loss function, which calculates the regression loss for all positive samples. The loss function is:

Loss = \frac{1}{N_{p o s}} \sum_{i} L_{c l s}^{i} + \frac{1}{N_{p o s}} \sum_{j} L_{r e g}^{j},

(1)

where Loss is the loss function,

N_{p o s}

is the number of positive samples,

L_{c l s}^{i}

is the classification loss of each sample,

i

is the number of all positive and negative samples,

L_{r e g}^{j}

is the regression loss of each positive sample, and

j

is the number of positive samples.

Focal loss is proposed based on the cross-entropy loss function. The purpose is to solve the positive and negative sample imbalance problem in the target detection task. The cross-entropy loss function is:

CE (p, y) = {\begin{matrix} - \log_{10} (p) if y = 1 \\ - \log_{10} (1 - p) otherwise, \end{matrix}

(2)

where

y = 1

indicates that the predicted output is a positive sample, and

p

is the probability of predicting a positive sample.

To solve the problem of imbalance between positive and negative samples during training, a weighting factor α is added to the cross-entropy function to reduce the weighting of negative samples. However, this approach does not distinguish between hard and easy samples, so another modulation factor

{(1 - p_{t})}^{β}

is introduced, thus reducing the weight of easy samples and making the model focus on hard samples during training. The focal loss function is:

L_{c l s} = F L (p_{t}) = - α_{t} {(1 - p_{t})}^{β} \log_{10} (p_{t}),

(3)

p_{t} = {\begin{matrix} p if y = 1 \\ 1 - p otherwise, \end{matrix}

(4)

where

F L

is the focal loss function,

α_{t}

is the weighting factor,

β

is the focusing parameter, and

p

is the model prediction probability. The function Smooth L1 loss is used to calculate the uncertainty of the bounding box regression and positioning as follows:

L_{reg} = S m o o t h L 1 = {\begin{matrix} 0.5 {(b o x_{p r e} - b o x_{t r u})}^{2} if | b o x_{p r e} - b o x_{t r u} | < 1 \\ | b o x_{p r e} - b o x_{t r u} | - 0.5 otherwise, \end{matrix}

(5)

where

b o x_{p r e}

is the location of the predicted bounding box, and

b o x_{t r u}

is the location of the ground truth bounding box.

2.4. Optimizing the Feature Pyramid Network

In the RetinaNet algorithms, the FPN is bottom up, with the semantic information at the higher levels being passed up through lateral connections. While this approach enhances the semantic information of the FPN, the higher-level feature maps go through multiple layers of the network, at which point the target information is already very fuzzy. In Figure 2, C5 is the high-level output of the feature extraction network and directly outputs feature map P5 after two convolutions. However, P5 has been convolved several times by this time and contains little target information, especially for small targets. Therefore, this study optimizes the classical FPN based on the RetinaNet algorithms by adding a top-down route to fuse the bottom semantic information with the top semantic information, thereby compensating for and enhancing the localization information. The classical FPN output C3_1 is first down-sampled and weight-fused with C4_1 and then the fused result is down-sampled and weight-fused with C5_1. The FPN-optimized outputs P3–P5 are obtained by convolving the results of each layer output. Figure 5 shows the structure of the optimized FPN.

2.5. Attention Mechanisms

The attention mechanism makes the network pay more attention to the desired target information. In this study, the goal of the network was to accurately detect maize tassels, so the attention mechanism was introduced into the network to make the model pay more attention to the characteristic information of maize tassels. The attention mechanism was divided into a channel attention mechanism, a spatial attention mechanism, and a combined spatial and channel attention mechanism.

This study used the CBAM attention mechanism [52], which combines the channel attention mechanism and the spatial attention mechanism. The input feature map was first passed through the channel attention module, which performs global maximum pooling and global average pooling on the individual input feature layers to obtain two channel attention vectors. These two channel attention vectors are then passed into a shared network consisting of a hidden layer and a multi-layer perceptron. The two output channel attention vectors are then summed, and the weights of each channel in the output feature layer are normalized to [0, 1] by a sigmoid function to obtain the weighting output of the channel attention module. We multiplied this weighting by the input feature layer to obtain the output feature map, then passed this output feature map into the spatial attention module and took the maximum pooling and average pooling for each feature point on the feature map. The two pooling results were then stacked, and the number of channels was adjusted by using a convolution with a unity channel count. The weights of each feature point in the feature layer were then normalized to [0, 1] by a sigmoid function, and the output weightings were multiplied by the input feature layer to obtain the CBAM output. Figure 6 shows the structure of the CBAM.

2.6. Experimental Environment and Configuration

The experimental environment required for model training includes hardware and software environments. The experiment was set up as follows: hardware environment: CPU, Intel i5-8000; RAM, 64G; GPU, NVIDIA GeForce GTX1080Ti (video memory, 11G); software environment: operating system, Windows 10.0; programming language, Python 3.7; deep learning framework, PyTorch 1.2; Cuda version 10.0. Before training, the model parameters needed to be set to the parameter values given in Table 2.

2.7. Evaluation Metrics

In this study, average precision (AP), precision (P), recall (R), and intersection over union (IOU) were used to evaluate the model detection performance. AP is a comprehensive evaluation metric of maize tassel detection, and its value is a very important indicator of model performance. Precision refers to the fraction of correctly predicted positive samples with respect to all predicted positive samples. Recall refers to the fraction of correctly predicted positive samples with respect to all positive samples. The intersection over union refers to the ratio of the intersection to the union of the prediction bounding box and ground truth. The average precision, precision, recall, and intersection over union are calculated as follows:

AP = \int_{0}^{1} P (R) d R,

(6)

P = \frac{TP}{TP + FP},

(7)

R = \frac{TP}{TP + FN},

(8)

where TP is positive samples predicted to be positive, TN is positive samples predicted to be negative, FP is negative samples predicted to be positive, and FN is positive samples predicted to be negative. The intersection over union is given by:

IOU = | \frac{A \cap B}{A \cup B} |,

(9)

where A is the prediction bounding box area, and B is the ground truth area.

In addition, this study also used counting metrics, mainly because detection metrics do not distinguish well between differences in detection due to different varieties and planting densities. The counting metrics are calculated as follows:

R^{2} = 1 - \frac{\sum_{k = 1}^{n} {(s_{i} - p_{i})}^{2}}{\sum_{1}^{n} {(s_{i} - \bar{s})}^{2}},

(10)

RMSE = \sqrt{\frac{\sum_{i = 1}^{n} {(s_{i} - p_{i})}^{2}}{n}},

(11)

MAE = \frac{1}{n} \sum_{i = 1}^{n} | s_{i} - p_{i} |,

(12)

where

R^{2}

is the coefficient of determination (the larger the value of

R^{2}

, the better the pre-diction of the model, and the maximum cannot exceed 1). RMSE is the root-mean-square error (the smaller the value of RMSE, the better the prediction of the model). MAE is the mean absolute error, where

n

is the number of test images.

s_{i}

and

p_{i}

are the ground truth number and the number of prediction bounding boxes of maize tassels in image

i

, and

\bar{s}

is the mean of the ground truth number.

3. Results and Analysis

3.1. Effectiveness Experiments

The experiments mainly used the RetinaNet algorithm to improve maize tassel detection by optimizing the network structure and introducing an attention mechanism. The experimental results appear in Table 3, where RetinaNet represents the original model, RetinaNet-FPN indicates that the FPN structure in RetinaNet was optimized, and RetinaNet-CBAM indicates that the attention mechanism was introduced into RetinaNet. Table 3 shows that, by optimizing the structure of FPN, RetinaNet improved 0.8% in average precision, 1.23% in precision, and 2.49% in recall. By introducing the attention mechanism into the RetinaNet algorithm, the model improved 0.53% in average precision, 0.33% in precision, and 1.61% in recall. By optimizing the FPN structure and introducing the attention mechanism, the RetinaNet model improved 1.84% in average precision, 1.57% in accuracy, and 4.6% in recall. This analysis shows that the detection effectiveness of the model improved upon optimizing the FPN structure and introducing the attention mechanism into the RetinaNet algorithm.

The RetinaNet algorithm outputs feature maps at five scales: 80 × 80, 40 × 40, 20 × 20, 10 × 10, and 5 × 5 pixels, which are used to detect small targets as well as medium and large targets. The experiments were carried out by analyzing the output feature maps of RetinaNet, RetinaNet-FPN, RetinaNet-CBAM, and the improved RetinaNet algorithm, and the results are shown in Figure 7. The results show that, after inputting a test image, the five feature maps output by the model after the image was trained had different sizes. The position of the maize tassels appeared clearly on the 80 × 80 and 40 × 40 pixel-scale feature maps. The position of the maize tassels was blurred on the 20 × 20 pixel-scale feature map and was largely indeterminable on the 10 × 10 and 5 × 5 pixel-scale feature maps. This shows that the model mainly uses the feature information from the larger feature maps (i.e., 80 × 80, 40 × 40, and 20 × 20 pixel scale) to locate and detect maize tassels. In contrast, the small feature maps (i.e., 10 × 10 and 5 × 5 pixel scale) do not provide sufficient localization information to detect maize tassels.

Figure 7 shows that optimizing the FPN structure in RetinaNet enhances the feature fusion capability of the model, resulting in a more pronounced representation of the shallow information of small targets. As seen in Figure 7b, the introduction of the attention mechanism in RetinaNet allowed the model to retain more information about target features, enabling the model to focus greater attention on the target. Figure 7d shows that, by optimizing the FPN structure and introducing the CBAM, the model more clearly expressed the feature information of maize tassels.

3.2. Analysis of the Results of Different Models

To further validate the model performance, we compared the improved RetinaNet model with the current mainstream object detection models. We used the same dataset to train the Faster R-CNN, YOLOX, and SSD algorithms and evaluated the performance of the model on the same test set. The results are given in Table 4 and show that the improved RetinaNet model produced an average precision of 97.17%, which is 20.26%, 7.13%, and 7.72% greater than the Faster R-CNN, YOLOX, and SSD models, respectively. For precision, the improved RetinaNet model reached 98.02%, which is 27.91%, 1.07%, and 8.4% greater than the Faster R-CNN, YOLOX, and SSD models, respectively. For recall, the improved RetinaNet model reached 90.36%, which is 19.26%, 9.68%, and 8.12% greater than the Faster R-CNN, YOLOX, and SSD models, respectively. Thus, the improved RetinaNet model outperforms the Faster R-CNN, YOLOX, and SSD models in terms of performance and is very effective for detecting maize tassels.

Figure 8 shows the P–R curves of the Faster R-CNN, YOLOX, and SSD models and the improved RetinaNet model. From Equation (6), the AP value is the area under the P–R curve with X-axis wrapping. Figure 9 shows that the P–R curve formed by the improved RetinaNet formed the largest area with X-axis wrapping, which is indicative of better performance.

To verify the detection capability of the improved RetinaNet model for smaller targets, we tested UAV maize tassel images using the Faster R-CNN, YOLOX, and SSD models and the improved RetinaNet model. To facilitate a comparative analysis, images with a resolution of 5472 × 3648 pixels were cropped to a resolution of 2700 × 1500 pixels, and the results are shown in Figure 9. These results show that the Faster R-CNN, YOLOX, SSD, and improved RetinaNet models could all detect the larger maize tassels on the left side of the figure. However, the Faster R-CNN and SSD models could not distinguish the overlapping tassels, like in position 1, although the YOLOX and the improved RetinaNet models could distinguish these overlapping tassels. For the 47 smaller maize tassels on the right side of the image, the Faster R-CNN algorithm detected 6 tassels, the YOLOX algorithm detected 22 tassels, the SSD algorithm detected 12 tassels, and the improved RetinaNet algorithm detected 42 tassels. This shows that the improved RetinaNet algorithm is better at detecting small maize tassels than the other models.

4. Discussion

4.1. Effect of Image Resolution on Detection of Maize Tassels

When using the same UAV to acquire images of maize tassels, different flight altitudes of the UAV can result in different image resolutions for the same area. Therefore, to investigate differences in the detection of maize tassels in the same area from images with different resolutions, we used the UAV to acquire images of the test area from an altitude of 5 m. The original images were cropped from 5472 × 3648 to 2000 × 2500 pixels for model detection purposes. We used Envi software to triple down-sample images with a resolution of 2000 × 2500 pixels to obtain images with resolutions of 1000 × 1250, 500 × 625, and 250 × 313 to simulate a UAV acquiring images of the same area from a height of 10, 20, and 40 m. The results of the improved RetinaNet model for detection appear in Figure 10.

Figure 10 contains a total of 35 maize tassels. In the image with a resolution of 2000 × 2500 pixels, the model detected 33 maize tassels and one false positive. In the image with a resolution of 1000 × 1250 pixels, the model detected 27 maize tassels and one false positive. In the image with a resolution of 500 × 625 pixels, the model detected 23 maize tassels and one false positive. In the image with a resolution of 250 × 313 pixels, the model detected zero maize tassels. Based on this analysis, the model becomes significantly less effective in detecting maize tassels as the image resolution decreases. In low-resolution images, the maize tassels are not easily detected because they contain fewer pixels, so less feature information can be extracted. Future work should investigate how to detect maize tassels at low resolution to improve the detection of maize tassels in low-resolution images.

4.2. Effect of Brightness on Detection of Maize Tassels

Although the improved RetinaNet model already performed better for detecting maize tassels, we further investigated whether the interference of complex backgrounds, such as brightness variations, affects the model’s detection capability. In this study, we conducted experiments to elucidate how image brightness affects the detection of maize tassels. The experiments used images of smaller and larger maize tassels from the test dataset and processed the image pixels in a Python program using the exponential transform function of the NumPy library to increase or decrease the brightness of the images. We used an improved RetinaNet model to detect the maize tassels under different conditions of brightness. The results appear in Figure 11.

Figure 11 shows that the detection by the improved RetinaNet model of maize tassels at different brightness levels depended significantly on the brightness. Figure 11(a1)–(a7) show the detection of smaller maize tassels. When the brightness was −75%, the improved RetinaNet model detected fewer maize tassels. When the brightness was −50% or −25%, the improved RetinaNet model detected a relatively constant number of maize tassels with a detection accuracy approaching 100%. With enhanced brightness (25%, 50%, and 75%), the RetinaNet model again detected fewer maize tassels, with the number of missed tassels increasing with increasing brightness. Figure 11(b1)–(b7) show the detection by the improved RetinaNet model of larger maize tassels. These results show that, as the brightness decreases, fewer maize tassels are detected, and, as the brightness increases, only larger maize tassels are detected.

The above analysis shows that the improved RetinaNet model has difficulty detecting maize tassels at different levels of brightness. As the brightness is reduced, the detection of maize tassels becomes progressively worse; as the brightness increases, smaller maize tassels are less often detected than larger ones. The main reason for this is that excessive brightness severely increases the interference from the image background, making it less likely that small maize tassels are detected. These experimental results have important implications for the detection of small maize tassels in complex scenarios. In practical scenarios, the effect of light intensity on detection must be considered when using UAV remote sensing and AI detection techniques for maize tassel detection.

4.3. Effect of Plant Variety on Detection of Maize Tassels

This study was designed to investigate how different maize varieties and the resulting maize tassel morphology affect the detection by the improved RetinaNet model. The experiment was carried out on 5, 9, and 20 August 2021 using the same UAV to obtain images of maize tassels from the same height (10 m). As of 5 August 2021, all maize varieties had already entered the filling stage, and, on 20 August 2021, all maize tassels had entered the milking stage. In other words, before these data were acquired, all maize tassels had already completed pollination, so their status remained largely unchanged during these experiments.

Figure 12 shows the morphology of maize tassels of different varieties. This study used maize tassel data acquired by the UAV on 9 August 2021 as the dataset to train the model used herein, and the test set of 320 images may be divided into five groups based on plant variety, with 64 images in each group. The maize tassel images acquired on 5 and 20 August were filtered, and 320 images of maize tassels were obtained in the same way in the same area as the test set acquired on 9 August. The images were divided by plant variety into five datasets, with 64 images in each group. After preparing the datasets, the maize tassels were detected by the improved RetinaNet model, and the results are shown in Figure 13.

For the Zhengdan958 variety, the experimental results gave R² = 0.9708, 0.9759, and 0.9654 for 5, 9, and 20 August, respectively, and RMSE = 0.7536, 0.7215, and 0.8523, respectively. Among the five varieties, the model was most effective for detecting maize tassels of Zhengdan958. For the Jingjiuqinzhu16 variety, the improved RetinaNet model produced R² = 0.8242, 0.8302, and 0.7881 and RMSE = 3.0136, 2.9263, and 3.5326, respectively. Of the five varieties, the most difficult to detect was Jingjiuqingzhu16. Zhengdan958, Nongkenuo336, and Tianci19 all had better detection results, whereas Jingjiuqingzhu16 and Xiangnuo2008 had poorer detection results. The main reason is that the spike branches of Jingjiuqingzhu16 and Xiangnuo2008 are thinner and have fewer branches, so they are more likely to be missed, whereas Zhengdan958, Nongkenuo336, and Tianci19 are more easily detected because they have larger and more numerous spike branches. The above analysis shows that differences in the tassel morphology of the different maize varieties can affect the efficiency of model detection. In practical applications, the flight height of the UAV can be adjusted to account for the varieties of maize tassels, or the data can be augmented during data processing to improve the detection results.

4.4. Effect of Planting Density on Detection of Maize Tassels

To investigate how the planting density affects the detection of maize tassels, 320 images acquired on 9 August 2021 from the sample area were classified according to the planting density, producing 16 images for each variety and each planting density. To verify how the planting density affects maize tassel detection by the improved RetinaNet model, we examined images of maize tassels at different planting densities for given plant varieties and calculated the MAE. In fact, we conducted thinning work at the seedling stage to ensure that the strongest plants were left at each point and that there were no shortages or excess seedlings, so the planting density was stable and reliable. The recommended planting density for Nongkenuo336 is 45,000–52,500 plants/hm², 67,500–75,000 plants/hm² for Jingjiuqingzhu16, 60,000–67,500 plants/hm² for Tianci19, 82,500–90,000 plants/hm² for Zhengdan958, and 52,500–60,000 plants/hm² for Xiangnuo2008. Figure 14 shows the experimental results.

The results in Figure 14 show that, at the same planting density, the MAE varied widely between plant varieties. At planting densities of 29,985, 44,978, 67,466, and 89,955 plants/hm², the respective MAEs of the model were 0.77, 1.01, 1.23, and 1.62 for Nongkenuo336, 2.09, 2.46, 2.98, and 3.32 for Jingjiuqinzhu16, 0.43, 0.61, 0.81, and 1.36 for Tianci19, 0.18, 0.26, 0.48, and 0.63 for Zhengdan958, and 0.81, 1.25, 1.56, and 1.92 for Xiangnuo2008. The MAEs for Zhengdan958 were significantly smaller than those for Jingjiuqinzhu16. The MAE for a given variety increased with increasing planting density.

These results show that the detection accuracy of the improved RetinaNet depends not only on planting density but also on plant variety. The main trend is that the detection error increased as the planting density increased. The main reason for this trend is that increasing planting density causes increasing overlap and shading between the tassels, interfering with the detection process. Maize tassel detection by the improved RetinaNet model also depended on the variety of maize tassels, with Zhengdan958 having larger and more numerous branches, which resulted in lower detection errors, whereas Jingjiuqinzhu16 has thinner and less numerous branches, which resulted in higher detection errors. These results confirm that maize tassels of different plant varieties affect the detection by the improved RetinaNet model.

5. Conclusions

The main purpose of this study was to improve the model for detection of maize tassels, especially when faced with multiscale maize tassel detection. We used an unmanned aerial vehicle (UAV) to obtain maize tassel images, created a maize tassel dataset, which we used to optimize the FPN structure, and introduced an attention mechanism based on the RetinaNet model. By improving the RetinaNet model, we could increase the detection effectiveness for maize tassels of different sizes as a way to achieve the goal of multiscale tassel detection. In addition, we analyzed different image resolutions, brightness levels, plant varieties, and planting densities to determine how these factors affect maize tassel detection by the improved RetinaNet model.

The results of this research led to the following conclusions: Optimizing the FPN structure and introducing the CBAM attention mechanism significantly improved the ability of the improved RetinaNet model to detect maize tassels. The average precision of the improved RetinaNet algorithm was 0.9717, the precision was 0.9802, and the recall was 0. 9036. The improved RetinaNet algorithm was also compared with the conventional object detection algorithms Faster R-CNN, YOLOX, and SSD. The improved RetinaNet algorithm more accurately detected maize tassels than these conventional algorithms, especially smaller maize tassels. The detection accuracy of the improved RetinaNet model decreased as the image resolution decreased. In addition, when the image brightness was low, the improved RetinaNet model better detected maize tassels and vice versa. Furthermore, this effect was amplified for small maize tassels. The tassel morphology depends on the maize variety, so detection by the improved RetinaNet algorithm also depended on the maize variety. Of the five maize varieties tested, Zhengdan958 (Jinjiuqinzhu16) tassels were detected the most (least) accurately. The detection accuracy also depended on planting density. Essentially, increasing the planting density increased the error in maize tassel detection. This study used UAV remote sensing technology and computer vision technology to improve maize de-fertilization and counting capabilities, which are vital for maize production and monitoring. In future work, we hope to predict maize tassel growth by using remote sensing and deep learning technologies.

Author Contributions

Conceptualization, B.W. and B.X.; methodology, B.W. and G.Y.; software, B.W. and J.G.; formal analysis, B.X. and H.Y.; investigation, S.X. and D.Z.; resources, B.X. and G.Y.; data curation, B.W. and B.X.; writing—original draft preparation, B.W.; writing—review and editing, B.W. and G.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (grant no. 2021YFD2000100, no. 2022YFF1003500, no. 2021YFD1201601), Chongqing Technology Innovation and Application Development Special Project (grant no. cstc2021jscx-gksbX0064), Qingyuan Smart Agriculture Research Institute + New R&D Institutions Construction in North and West Guangdong (grant no. 2019B090905006), and Construction of Scientific and Technological Innovation Ability of Beijing Academy of Agriculture and Forestry Sciences (grant no. KJCX20230434).

Data Availability Statement

All data and code from this paper are available upon contact with the corresponding author Bo Xu via email at xub@nercita.org.cn.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gage, J.L.; Miller, N.D.; Spalding, E.P. TIPS: A system for automated image-based phenotyping of maize tassels. Plant Methods 2017, 13, 21. [Google Scholar] [CrossRef]
Huang, J.; Gómez-Dans, J.L.; Huang, H. Assimilation of remote sensing into crop growth models: Current status and perspectives. Agric. For. Meteorol. 2019, 276, 107609. [Google Scholar] [CrossRef]
Su, Y.; Wu, F.; Ao, Z.; Jin, S.; Qin, F.; Liu, B.; Pang, S.; Liu, L.; Guo, Q. Evaluating maize phenotype dynamics under drought stress using terrestrial LiDAR. Plant Methods 2019, 15, 11. [Google Scholar] [CrossRef]
Lu, H.; Cao, Z.; Xiao, Y.; Fang, Z.; Zhu, Y. Towards fine-grained maize tassel flowering status recognition: Dataset, theory and practice. Appl. Soft Comput. 2017, 56, 34–45. [Google Scholar] [CrossRef]
Fan, B.; Li, Y.; Zhang, R. Review on the technological development and application of UAV systems. Chin. J. Electron. 2020, 29, 199–207. [Google Scholar] [CrossRef]
Lu, H.; Cao, Z.; Xiao, Y.; Li, Y.; Zhu, Y. Joint crop and tassel segmentation in the wild. In Proceedings of the 2015 Chinese Automation Congress (CAC), Wuhan, China, 27–29 November 2015; pp. 474–479. [Google Scholar]
Niu, Q.; Feng, H.; Yang, G.; Li, C.; Yang, H.; Xu, B.; Zhao, Y. UAV-based digital imagery for monitoring plant height and leaf area index of maize breeding material. Trans. Chin. Soc. Agric. Eng. 2018, 34, 73–82. [Google Scholar]
Moreira, J.N.; Silva, P.S.L.; Silva, K.M.B.; Dombroski, J.L.D.; Castro, R.S. Effect of detasseling on baby corn, green ear and grain yield of two maize hybrids. Hortic. Bras. 2010, 28, 406–411. [Google Scholar] [CrossRef]
Bolton, D.K.; Friedl, M.A. Forecasting crop yield using remotely sensed vegetation indices and crop phenology metrics. Agric. For. Meteorol. 2013, 173, 74–84. [Google Scholar] [CrossRef]
Herrmann, I.; Bdolach, E.; Montekyo, Y.; Rachmilevitch, S.; Townsend, P.A.; Karnieli, A. Assessment of maize yield and phenology by drone-mounted superspectral camera. Precis. Agric 2019, 21, 51–76. [Google Scholar] [CrossRef]
Yang, D.; Chen, G.; Wu, B. Effect of emasculation and top leaf on photosynthetic characteristics dry matter accumulation and yield of maize. Northwest Agric. J. 2022, 31, 25–33. [Google Scholar]
Madec, S.; Jin, X.; Lu, H.; De Solan, B.; Liu, S.; Duyme, F. Ear density estimation from high resolution RGB imagery using deep learning technique. Agric. For. Meteorol. 2019, 264, 225–234. [Google Scholar] [CrossRef]
Ortega-Blu, R.; Molina-Roco, M. Evaluation of vegetation indices and apparent soil electrical conductivity for site-specific vineyard management in Chile. Precis. Agric. 2016, 17, 434–450. [Google Scholar] [CrossRef]
Kumar, A.; Taparia, M.; Rajalakshmi, P.; Desai, U.B.; Naik, B.; Guo, W. UA V Based Remote Sensing for Tassel Detection and Growth Stage Estimation of Maize Crop using F-RCNN. In Proceedings of the Computer Vision Problems in Plant Phenotyping, Long Beach, CA, USA, 17 June 2019. [Google Scholar]
Lu, H.; Cao, Z.; Xiao, Y.; Li, Y.; Zhu, Y. Region-based color modelling for joint crop and maize tassel segmentation. Biosyst. Eng. 2016, 147, 139–150. [Google Scholar] [CrossRef]
Chernov, V.; Alander, J.; Bochko, V. Integer-based accurate conversion between RGB and HSV color spaces. Comput. Electr. Eng. 2015, 46, 328–337. [Google Scholar] [CrossRef]
Tang, W.; Zhang, Y.; Zhang, D. Research on corn tassel detection system based on machine vision. In Proceedings of the 2011 Academic Conference of China Agricultural Engineering Society, Zibo, China, 27–29 May 2011; pp. 1309–1314. [Google Scholar]
Kurtulmuş, F.; Kavdir, I. Detecting corn tassels using computer vision and support vector machines. Expert Syst. Appl. 2014, 41, 7390–7397. [Google Scholar] [CrossRef]
Zhengchong, M.; Yahui, S. Algorithm of male tassel recognition based on HSI space. Transducer Microcyst Technol. 2018, 37, 117–119. [Google Scholar]
Qi, Z. The Research on Extraction of Maize Phenotypic Information Based on Unmanned Aerial Vehicle. Ph.D. Thesis, Northeast Agricultural University, Harbin, China, 2017. [Google Scholar]
Niu, Y.; Zhang, L.; Zhang, H.; Han, W.; Peng, X. Estimating above-ground biomass of maize using features derived from UAV-based RGB imagery. Remote Sens. 2019, 11, 1261. [Google Scholar] [CrossRef]
Yeom, J.; Jung, J.; Chang, A.; Maeda, M.; Landivar, J. Automated Open Cotton Boll Detection for Yield Estimation Using Unmanned Aircraft Vehicle (UAV) Data. Remote Sens. 2018, 10, 1895. [Google Scholar] [CrossRef]
Zaman-Allah, M.; Vergara, O.; Araus, J.; Tarekegne, A.; Magorokosho, C.; Zarco-Tejada, P.; Hornero, A.; Albà, A.H.; Das, B.; Craufurd, P.; et al. Unmanned aerial platform-based multi-spectral imaging for field phenotyping of maize. Plant Methods 2015, 11, 35. [Google Scholar] [CrossRef]
Amit, Y.; Felzenszwalb, P.; Girshick, R. Object detection. In Computer Vision: A Reference Guide; Springer: Berlin/Heidelberg, Germany, 2020; pp. 1–9. [Google Scholar]
Papageorgiou, C.P.; Oren, M.; Poggio, T. A general framework for object detection. In Proceedings of the Sixth International Conference on Computer Vision, Bombay, India, 7 January 1998; IEEE: New York, NY, USA, 1998; pp. 555–562. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. arXiv 2019, arXiv:1905.05055. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Deng, L.; Yu, D. Deep learning: Methods and applications. Found. Trends Signal Process. 2014, 7, 197–387. [Google Scholar] [CrossRef]
Lu, H.; Cao, Z.; Xiao, Y.; Zhuang, B.; Shen, C. TasselNet: Counting maize tassels in the wild via local counts regression network. Plant Methods 2017, 13, 79. [Google Scholar] [CrossRef]
Liang, Y.; Chen, Q.; Dong, C.; Yang, C. Application of deep-learning and UAV for field surveying corn tassel. Fujian Agric. J. 2020, 35, 456–464. [Google Scholar]
Yang, S.; Liu, J.; Xu, K.; Sang, X.; Ning, J.; Zhang, Z. Improved centerNet based maize tassel recognition for UAV remote sensing image. Trans. Chin. Soc. Agric. Mach. 2021, 52, 206–212. [Google Scholar]
Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
Liu, Y.; Cen, C.; Che, Y.; Ke, R.; Ma, Y.; Ma, Y. Detection of maize tassels from UAV RGB imagery with faster R-CNN. Remote Sens. 2020, 12, 338. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Zou, H.; Lu, H.; Li, Y.; Liu, L.; Cao, Z. Maize tassels detection: A benchmark of the state of the art. Plant Methods 2020, 16, 108. [Google Scholar] [CrossRef]
Wei, Y.; You, X.; Li, H. Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognit. 2016, 58, 216–226. [Google Scholar] [CrossRef]
Jiang, Y.; Dong, L.; Chen, Y.; Xu, W. An infrared small target detection algorithm based on peak aggregation and Gaussian discrimination. IEEE Access 2020, 8, 106214–106225. [Google Scholar] [CrossRef]
Chen, Y.; Song, B.; Wang, D.; Guo, L. An effective infrared small target detection method based on the human visual attention. Infrared Phys. Technol. 2018, 95, 128–135. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA, 21–26 July 2017; pp. 2980–2988. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 1137–1149. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Li, J.; Li, C.; Fei, S.; Ma, C.; Chen, W.; Ding, F.; Wang, Y.; Li, Y.; Shi, J.; Xiao, Z. Wheat ear recognition based on retinaNet and transfer learning. Sensors 2021, 21, 4845. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, X.; Chen, W.; Li, Y.; Wang, J. Research on recognition of fly species based on improved RetinaNet and CBAM. IEEE Access 2020, 8, 102907–102919. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]

Figure 1. Location and layout of the experimental study area (A1–A5 represent different varieties).

Figure 2. Structure of RetinaNet network.

Figure 3. Structure of residual block: (a) BasicBlock, (b) BottleNeck.

Figure 4. Structure of classification and regression subnet.

Figure 5. The structure of the optimized feature pyramid network.

Figure 6. The structure of the CBAM attention mechanism.

Figure 7. Feature map of output at different scales.

Figure 8. P-R curves for different models.

Figure 9. Results of detection of maize tassels by different object detection models.

Figure 10. Detection results at different resolutions.

Figure 11. Detection of maize tassels at different brightness levels ((a1–a7) and (b1–b7) represent images at different brightness levels respectively).

Figure 12. Morphology of maize tassels of different plant varieties.

Figure 13. Detection of maize tassels of different plant varieties based on improved RetinaNet model.

Figure 14. Mean absolute error for detection of maize tassels of different plant varieties under different planting densities.

Table 1. Agronomic characteristics of maize varieties.

Varieties	Agronomy Characteristics
Nongkenuo336 (A1)	Semi-compact, plant height 230 cm, ear height 82 cm, summer sowing growth period 95 days
Jingjiuqingzhu16 (A2)	Semi-compact, plant height 313 cm, ear height 131 cm, summer sowing growth period 97 days
Tianci19 (A3)	Semi-compact, plant height 263 cm, ear height 100 cm, summer sowing growth period 102 days
Zhengdan958 (A4)	Compact, plant height 269 cm, ear height 119 cm, summer sowing growth period 96 days
Xiangnuo2008 (A5)	Semi-compact, plant height 233 cm, ear height 89 cm, summer sowing growth period 90 days

Table 2. Parametrization.

Parameters	Value
Epoch	100
Batch size	4
Learning rate	10⁻⁴
Confidence thresholds	0.5
NMS thresholds	0.3

Table 3. Analysis of performance of different components introduced for detecting maize tassels.

Method	AP/%	Precision/%	Recall/%
RetinaNet	95.33	96.45	85.76
RetinaNet-FPN	96.13	97.68	88.25
RetinaNet-CBAM	95.86	96.98	87.37
Improved RetinaNet	97.17	98.02	90.36

Table 4. Comparison with mainstream methods.

Method	Average Precision/%	Precision/%	Recall/%
Faster R-CNN	76.91	70.11	71.10
YOLOX	90.04	96.95	80.68
SSD	89.45	89.62	82.24
Improved RetinaNet	97.17	98.02	90.36

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, B.; Yang, G.; Yang, H.; Gu, J.; Xu, S.; Zhao, D.; Xu, B. Multiscale Maize Tassel Identification Based on Improved RetinaNet Model and UAV Images. Remote Sens. 2023, 15, 2530. https://doi.org/10.3390/rs15102530

AMA Style

Wang B, Yang G, Yang H, Gu J, Xu S, Zhao D, Xu B. Multiscale Maize Tassel Identification Based on Improved RetinaNet Model and UAV Images. Remote Sensing. 2023; 15(10):2530. https://doi.org/10.3390/rs15102530

Chicago/Turabian Style

Wang, Binbin, Guijun Yang, Hao Yang, Jinan Gu, Sizhe Xu, Dan Zhao, and Bo Xu. 2023. "Multiscale Maize Tassel Identification Based on Improved RetinaNet Model and UAV Images" Remote Sensing 15, no. 10: 2530. https://doi.org/10.3390/rs15102530

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multiscale Maize Tassel Identification Based on Improved RetinaNet Model and UAV Images

Abstract

1. Introduction

2. Materials and Methods

2.1. Field Experiments

2.2. Data Acquisition and Preprocessing

2.3. Model Description

2.4. Optimizing the Feature Pyramid Network

2.5. Attention Mechanisms

2.6. Experimental Environment and Configuration

2.7. Evaluation Metrics

3. Results and Analysis

3.1. Effectiveness Experiments

3.2. Analysis of the Results of Different Models

4. Discussion

4.1. Effect of Image Resolution on Detection of Maize Tassels

4.2. Effect of Brightness on Detection of Maize Tassels

4.3. Effect of Plant Variety on Detection of Maize Tassels

4.4. Effect of Planting Density on Detection of Maize Tassels

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI