1. Introduction
Wheat is one of the main food sources for humans, contributing about 20% of the total dietary calories and proteins [
1]. With the growing world population, the steady increase in wheat production is of great significance. Wheat production has traditionally been threatened by various diseases, pests, and abiotic stresses. According to statistics, the global wheat yield loss caused by fungal diseases is as high as 15% to 20% [
2]. Among them, wheat Fusarium head blight (FHB) is one of the most harmful fungal diseases. It is mainly caused by Fusarium graminearum, infecting spike flowers at the flowering stage and expanding along the panicle axis during grain filling and maturation. The production and accumulation of toxins, such as deoxynivalenol (DON), Fusarium nivalenol (NIV), and zearalenol (ZEN) [
3], can reduce the yield and quality of wheat, causing great harm to human and animal health [
4]. The breeding of FHB-resistant varieties is one of the most important means to mitigate the effects of the disease. In order to develop resistant varieties, hundreds of lines must be assessed each year for FHB severity. Protocols for assessing FHB resistance often rely on manual detection. The severity of FHB in wheat can be accurately scored by counting infected spikelets and calculating its percentage in total spikelets [
5]. However, this traditional approach is time-consuming, labor-intensive, and prone to human error. Thus, there is an important need to develop a more effective, non-destructive, and high-throughput approach to assess this disease in the field.
The commonly used methods for FHB detection mainly focus on visual analysis, chromatography, polymerase chain reaction (PCR), and enzyme-linked immunosorbent assay (ELIS). Inspection by experienced experts is prone to subjective interference and human error. Biochemical methods, such as chromatography and ELIS, are very accurate, but they often require complex processing steps, which are not suitable for analyzing a large number of FHB-infected wheat spikes in reality [
6]. In recent years, imaging and spectroscopic methods, including near infrared spectroscopy (NIRS) and hyperspectral imaging (HSI), have shown strong potential in agriculture and food, especially in crop disease detection [
7,
8,
9,
10,
11,
12,
13,
14,
15,
16]. NIRS is based on differences in the absorption, emission, or transmission of light by substances, whose fingerprint features are related to changes in the apparent color, internal composition, and structure of the sample [
17]. Peiris et al. [
18] proposed an automated single-kernel NIRS method for classifying healthy and FHB-infected wheat with accuracy as high as 99.9%. However, NIRS can only obtain point features of the kernel and cannot achieve large-scale and rapid classification. As a non-invasive, high-throughput, and remote sensing method for plant phenotyping, HSI can merge the spatial and spectral information into a 3D data matrix, but the data in the matrix is too large to be used in real-time analysis [
4,
19].
With the development of artificial intelligence, color imaging based on machine learning has made great progress. Conventional machine learning methods, such as random forest (RF), K-nearest neighbor (KNN), linear discriminant analysis (LDA), and partial least squares discriminant analysis, have been widely used for crop plant detection and disease evaluation [
4,
9]. However, these methods perform poorly on large-scale datasets and complex feature scenarios. As a new field of machine learning, deep learning (DL) has gradually begun to show its advantages in image classification, object detection, and natural language processing [
20,
21]. Deep learning methods have become a preferred approach for disease identification in agricultural fields [
22]. Convolutional neural networks (CNN) have been widely used in agriculture in recent years by extracting key features that use different combinations of layers, the translation invariance of convolutional operators, and spatial relationships between adjacent data. Classical CNNs, such as LeNet-5 [
23], AlexNet [
24], ResNet [
25], and VGG [
26] were successfully employed for plant disease detection [
27,
28]. By combining the machine learning and deep learning methods, Hassan SM et al. [
29] proposed two methods including shallow VGG with RF and shallow VGG with Xgboost to identify the diseases in corn, potato, and tomato, with the average accuracy as high as 95.70%. Hassan et al. [
30] proposed a novel CNN model based on the inception and residual connection to classify the disease of four different plants. The testing accuracies on plant village, rice, and cassava datasets were 99.39%, 99.66%, and 76.59%, respectively. In practical application, the increase of convolutional layers and convolutional kernels of a CNN can enable the model to extract more abstract and refined features, thereby exhibiting excellent performance [
26]. However, this procedure may cause the CNN to lose focus on the features and suffer from a vanishing gradient. Residual modules effectively create shortcuts in a sequential network. It leverages a shortcut connection method to weaken the continuous multiplication effect in gradient backpropagation to solve the vanishing gradient problem [
25,
31]. Girshick, the first author to apply deep learning for object detection, used the region-based convolutional neural network (R-CNN) model to increase the detection rate from 35.1% to 53.7% on the PASCAL VOC dataset [
32]. Subsequently, Girshick launched Fast R-CNN and Faster R-CNN network models based on R-CNN, which greatly improved the accuracy and speed of the algorithm in wheat spike detection [
33,
34,
35]. A pulse-coupled neural network (PCNN) with K-means clustering of the improved artificial bee colony (IABC) was developed by Zhang et al. [
36] to segment wheat spikes infected with FHB. However, in that research, only one spike in the picture was taken into consideration, which was not practical for high throughput detection in the field condition. Kumar et al. [
37] used a deep convolution neural network (DCNN) to automatically classify four wheat rust diseases, achieving an accuracy of 97.16%.
Mask R-CNN is a deep learning algorithm that achieves instance segmentation [
38]. Mask R-CNN extends Faster R-CNN by adding a branch that predicts an object mask, replacing the RoI Pooling layer with RoI Align. ROI Align solves the problem of mis-alignment caused by quantization in RoI Pooling operations. Kumar et al. [
39] quantified the severity of loose smut in wheat using Mask R-CNN. However, the degree of disease was not graded. Additionally, the accuracy for disease degree classification was not verified. In addition, the calculation of the proportion of the diseased area of the whole leaf was not clearly indicated. Kumar et al. [
40] used the Mask R-CNN to recognize the wheat yellow rust disease. However, this study did not include enough datasets, and the result of background segmentation was not adequate. Yang et al. [
41] demonstrated the potential of Mask R-CNN to identify leaves in plant images for rapid phenotyping with an average accuracy of up to 91.5%. Su et al. [
42] utilized a dual deep learning framework based on Mask R-CNN to detect the wheat for FHB severity in field trials. Due to the deficiencies of the previous strategies, more advanced models are needed to evaluate the resistance of wheat to FHB. In recent years, Chen et al. [
43] proposed a more advanced instance segmentation model of BlendMask compared to Mask R-CNN. BlendMask is a state-of-the-art instance segmentation method based on a fully convolutional one-stage (FCOS) object detection network [
43]. Xi et al. [
44] used two instance segmentation networks including BlendMask and Mask R-CNN to delineate the ginkgo tree crowns. The results showed that the capability of BlendMask outperformed Mask R-CNN. Compared to Mask R-CNN, the BlendMask network model has the characteristics of less computation, higher mask quality, and stable inference time. Therefore, it is necessary to apply the BlendMask model for the image segmentation of wheat spikes and target recognition of the diseased areas.
The assessment of plant disease severity is another important and challenging task in agriculture. Efficient evaluation methods should be of great help to growers and breeders.
Table 1 summarizes some studies using advanced deep learning methods to assess crop disease severity. Esgario et al. [
45] established five models to successfully classify the severity of coffee disease into five grades. ResNet50 achieved the best accuracy of 84.13%. Pan et al. [
46] proposed Faster R-CNN (VGG16) and Siamese networks for strawberry leaf scorch severity estimation. An accuracy of 88.3% was achieved on a new dataset, but the manual labeling method is time-consuming and prone to subjective errors. Joshi et al. [
47] used VirLeafNet to classify Vigna mungo disease into three grades and the accuracy reached 91.5%. Although the aforementioned studies performed well in determining the severity of plant diseases, the accuracy was lower than the protocol proposed in the current study. In another study, Zhang et al. [
48] used the ratio of the number of diseased wheats to the total number of wheats as a method for evaluating disease severity. However, this method ignored the overlapping wheat spikes in the image. In the studies of Ji et al. [
49] and Wu et al. [
50], improved YoLo V5 and deeplabV3+ achieved the accuracies of 97.75% and 95.34%, respectively, in evaluating disease severities of grape and pepper, respectively. Different from the single-stage segmentation methods mentioned above, Liu et al. [
51] developed a two-stage framework to automatically estimate the severity of apple leaf disease in the field, yielding an accuracy of 96.41%. However, the applicability of the framework used in their study lacked validation on multi-leaf images.
With the advancements in machine learning and deep learning techniques, the methods for plant disease detection have shown promising performance. However, the original data for model training were mostly acquired in a lab, which would limit the performance of the method in real field conditions [
52]. The main objective of this study was to investigate the feasibility of automatic tandem dual BlendMask networks for assessments of wheat for FHB severity in field trials. The specific steps of this study were to: (1) capture high-quality images of wheat spikes in the field, (2) annotate wheat spikes and diseased areas in the raw images, (3) train a BlendMask model to detect and segment wheat spikes in full-size images, (4) train a second BlendMask model to predict diseased areas in individual spikes, (5) write a program that combines dual BlendMask networks to simultaneously display the results of wheat spike detection and diseased area segmentation in full-size images, and (6) evaluate the disease grade of wheat FHB based on the ratio of the diseased area to the overall wheat spike. To our knowledge, this is the first study to assess the severity of wheat FHB based on automatic tandem dual BlendMask networks.
4. Classification of Wheat FHB Severity Grades
Figure 10a depicts the ground truth (the visual rating of spikes in the acquired images by an expert) of wheat spikes with different disease grades in the training sets. As seen in
Figure 10a, 21.2%, 28.2%, 32.5%, and 18.1% of the samples in the training set are categorized as healthy, mild, moderate, and severe, respectively.
Figure 10b shows the ground truth and prediction of wheat spikes with different disease grades in the validation sets. It was noticed that 20%, 29.6%, 30.3%, and 20.1% of the samples in the validation set were categorized as healthy, mild, moderate, and severe in the ground truth, which is generally similar to those from the training set. The distribution of the predicted results on the four grades is almost identical to that of the ground truth. The maximum error between the predicted quantity and the actual quantity occurs for the severe grade and the number is 10, while the minimum bias is only 1 when it comes to the mild grade.
Through the statistical analysis of the actual value and predicted value of the disease severity in the four disease grades shown in
Table 5, it can be seen that the predicted value is always less than or equal to the actual value whether it is the average value or the maximum and minimum value of the disease severity. Therefore, the model may have a tendency to underestimate the severity of the disease in practical applications.
To further verify the accuracy of the classification results obtained from the model, the confusion matrix was applied to analyze the similarities and differences between the predicted results and the ground truth. Precision and sensitivity were first obtained. Based on the precision and sensitivity, the
F1-score for each of the four grades was calculated. As shown in
Table 6, it is obvious that all the values are around 90%. The lowest
F1-score score was 91.1% while the highest score was 93.2%. The average
F1-score was 92.22%.
Figure 11 depicts a confusion matrix for wheat FHB grades. The average accuracy for FHB severity classification was 91.8%. Accuracies of 90%, 91%, 93%, and 93% were obtained for the four grades, respectively. As shown in
Figure 11, mild grade samples are most easily misclassified to the healthy grade. This may be because the healthy and mild grades have similar area ratios. Since the diseased area for these two grades is relatively small, the probability of misidentification is likely to increase. Additionally, moderate grade samples are more likely misclassified to the severe grade. This may be due to the smaller difference in the ratio of diseased areas to spike areas for the moderate and severe grades.
5. Discussion
The research proposed a new approach using tandem dual BlendMask networks for automatic severity estimation of wheat FHB in the field. Three main parts were involved in this study, including wheat spike segmentation, FHB disease segmentation, and disease severity classification. RGB images were used to train the dual BlendMask framework to evaluate the severity of wheat FHB disease. Although positive detection results were obtained, there were still some errors in the prediction of the disease severity. These errors may come from the algorithm model or the image annotation. Data annotation is a bottleneck for segmentation tasks [
59]. The annotation process is laborious and time-consuming. The annotation results are greatly affected by human factors. In the future, instance segmentation based on semi- and weakly-supervised methods can be considered. In order to maximize the utilization of the dataset, data augmentation is also an essential step. Fang et al., used data augmentation to solve the issue of insufficient training datasets in instance segmentation [
60]. The application of random transformations, such as flipping, cropping, and changing saturation to generate new images, can generate new images that effectively augment the training set.
It is a challenging task to realize the recognition of wheat spikes in the field. In this study, FCOS in the BlendMask deep learning framework was used for wheat FHB detection. The model was capable of detecting wheat spikes within a complex environment. Comparing the neural network method with Laws texture energy [
61], the BlendMask model achieved a higher accuracy of 85.56% for identification of high-density wheat spikes. The main reason for the success of the proposed model is the FCOS used in the detector module, which can generate the feature from top-down. Although the SpikeSegNet model achieved an accuracy of 99.91% in the study of Misra et al. [
62], the wheat spikes were low-density in collected images.
Two trained BlendMask models are connected in series to directly display the recognition results of wheat spikes and diseased areas in the original images. The BlendMask performed very well, yielding a detection rate as high as 99.32% for FHB detection compared to 98.81% for the study of Su et al. [
42]. The main reasons for the success of our model are as follows:(1) compared to Mask R-CNN, the Blender module of BlendMask provides higher-quality masks and (2) the wheat in the dataset has been marked with high precision, which helps to improve the performance and robustness of the BlendMask model. The proportion of diseased areas of the whole wheat spike displayed directly on the original input image, thus high-throughput and real-time analysis can be constructed in the field. Many deformation models of the transformer, such as vision transformer (VIT) [
63] and swin transformer [
64], perform well in image classification, target recognition, and image segmentation. In the future, more advanced semantic segmentation algorithms based on transformers should be considered.
It was noticed that the automatic tandem dual BlendMask networks successfully segmented individual wheat spikes and FHB-diseased areas simultaneously from images of multiple spikes with complex backgrounds. The proposed method showed great potential for non-destructive and high-throughput evaluation of the severity of wheat FHB disease. Su et al. [
42] used the same dataset as this study and the Mask R-CNN model to segment wheat individuals and the disease spots on wheat individuals, and the accuracy of segmentation detection reached 77.76% and 98.81%, respectively. In this study, the segmentation detection accuracy of these two parts has been improved to a certain extent, reaching 85.56% and 99.32%. Mask R-CNN takes 250,000 iterations to achieve the accuracy while BlendMask reaches the accuracy mentioned above after 170,000 iterations. Therefore, the BlendMask model is more concise and efficient. By linking the two models, the average time to identify the severity of a wheat plant was 0.09 s.
The objective of this study is to apply the constructed model to a car taking photos in the field, so the model is biased towards proximal perceptual detection. Nowadays, unmanned aerial vehicles (UAVs) are widely used for efficient detection of crop diseases because they can sense a larger range and consume less manpower, but such detection generally relies on the expensive remote sensing equipment on the UAVs. In future studies, we will try to improve the model and make it adapt to the high density of wheat images or videos taken by UAVs. When the UAVs fly at a low altitude over the same kind of wheat in the field, an ordinary camera with a relatively low cost can be used to take photos or video and evaluate the severity of the wheat FHB disease. This is helpful to screen out the wheat with better disease resistance.