Next Article in Journal
Effect of Foliar Fertilization on the Physiological Parameters, Yield and Quality Indices of the Winter Wheat
Previous Article in Journal
Stem Hydraulic Conductance, Leaf Photosynthesis, and Carbon Metabolism Responses of Cotton to Short-Term Drought and Rewatering
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Cucumber Leaf Disease Severity Grading Method in Natural Environment Based on the Fusion of TRNet and U-Net

1
School of Information Science and Technology, Hebei Agricultural University, Baoding 071001, China
2
Hebei Key Laboratory of Agricultural Big Data, Baoding 071001, China
3
National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
4
College of Mechanical and Electrical Engineering, Hebei Agricultural University, Baoding 071001, China
*
Author to whom correspondence should be addressed.
Agronomy 2024, 14(1), 72; https://doi.org/10.3390/agronomy14010072
Submission received: 2 December 2023 / Revised: 24 December 2023 / Accepted: 26 December 2023 / Published: 27 December 2023
(This article belongs to the Section Pest and Disease Management)

Abstract

:
Disease severity grading is the primary decision-making basis for the amount of pesticide usage in vegetable disease prevention and control. Based on deep learning, this paper proposed an integrated framework, which automatically segments the target leaf and disease spots in cucumber images using different semantic segmentation networks and then calculates the area of disease spots and the target leaf for disease severity grading. Two independent datasets of leaves and lesions were constructed, which served as the training set for the first-stage diseased leaf segmentation and the second-stage lesion segmentation models. The leaf dataset contains 1140 images, and the lesion data set contains 405 images. The proposed TRNet was composed of a convolutional network and a Transformer network and achieved an accuracy of 93.94% by fusing local features and global features for leaf segmentation. In the second stage, U-Net (Resnet50 as the feature network) was used for lesion segmentation, and a Dice coefficient of 68.14% was obtained. After integrating TRNet and U-Net, a Dice coefficient of 68.83% was obtained. Overall, the two-stage segmentation network achieved an average accuracy of 94.49% and 94.43% in the severity grading of cucumber downy mildew and cucumber anthracnose, respectively. Compared with DUNet and BLSNet, the average accuracy of TUNet in cucumber downy mildew and cucumber anthracnose severity classification increased by 4.71% and 8.08%, respectively. The proposed model showed a strong capability in segmenting cucumber leaves and disease spots at the pixel level, providing a feasible method for evaluating the severity of cucumber downy mildew and anthracnose.

1. Introduction

The cucumber is a kind of low-calorie vegetable containing a variety of vitamins and minerals and is widely grown all over the world. According to statistics from the Food and Agriculture Organization, the global cucumber planting area in 2020 was approximately 2.25 million hectares, with a yield of about 90.35 million tons, making it the third largest vegetable crop in the world [1]. The cucumber has a longer growth cycle and may be affected by a number of diseases. As reported, there are 25 common cucumber diseases, among which 18 are leaf diseases (>70%).
It can be seen that the prevention and control of cucumber diseases are fraught with enormous challenges. First of all, they must address three key questions: What is the type of disease? Where are the lesions in infected leaves? How severe is the disease? The requirements raised by these three aspects perfectly align with the classification, detection, and segmentation tasks in computer vision. Specifically, “what” corresponds to the classification task: after inputting a disease image into the model, it is expected to output the category label to which the image belongs based on the detected features. “Where” corresponds to the object detection and localization task: the model is not only required to identify the type of disease present in the image but also to indicate the location of the abnormality. “How” corresponds to the semantic segmentation task: through semantic segmentation, the model is expected to output a series of useful information, such as the size and location of disease spots, in order to comprehensively evaluate the disease severity and guide subsequent pesticide usage.
Among the three questions above, there have been a number of studies focusing on the classification and detection tasks related to the prevention and control of crop diseases, and fruitful results have been achieved. For example, Yang et al. [2] optimized GoogleNet for rice disease detection and achieved an accuracy of 99.58%. Muhammad et al. [3,4] constructed multiple convolutional neural network (CNN) structures and reported that the Xception and DenseNet architectures delivered better performance in multi-label plant disease classification. Zhang et al. [5] proposed to use residual paths instead of the original MU-Net skip connection in U-Net to segment diseased leaves of corn and cucumber, and the segmentation accuracy was significantly improved. Bhagat et al. [6,7] constructed the Eff-UNet++ model, which uses EfficientNet-B4 as the encoder and a modified UNet++ as the decoder. This model achieved a Dice coefficient of 83.44% in the segmentation of leaves in the KOMATSUNA dataset.
However, there are a few studies on the severity grading of diseases. Assessing and grading disease severity has important practical significance because it directly affects the formulation of control plans and the prediction of crop losses and provides a basis variable spraying. The existing disease severity grading methods can be divided into two categories. The first category of methods is to directly construct a classification model to identify the type and severity of the disease. For example, Esgario et al. [8] used two CNNs in parallel to classify the disease type and severity of coffee leaves and achieved an accuracy of 95.24% and 86.51%, respectively. Liang et al. [9] proposed the PD2SE-Net for category recognition, disease classification, and severity estimation, and the accuracy of disease severity estimation reached 91%. Hu et al. [10] improved the Faster R-CNN model for detecting tea tree leaves and then used the VGG16 network to classify the disease severity. Pan et al. [11] employed the Faster R-CNN model with VGG16 as the feature network to extract strawberry leaf spots to form a new dataset and then used the Siamese model to estimate the severity of strawberry burning. Dhiman et al. [12] classified the severity of citrus diseases as high, medium, low, and healthy and used the optimized VGGNet16 to classify the severity of diseased fruits, which achieved an accuracy of 97%. The core of this type of method is to regard disease severity grading as a classification task in order to establish a relationship between the disease severity and the samples using an appropriate classification model. The advantage of such a method lies in the ease of implementation, while the disadvantage is that the disease severity data in the dataset is manually labelled, which involves a high degree of subjectivity and lacks a stringent quantitative standard [13].
The second category of methods is to segment the diseased regions through semantic segmentation first, and then calculate the ratio of the area of diseased regions to the total area in order to estimate the disease severity. The nature of semantic segmentation is to classify images pixel by pixel. Wspanialy et al. [14] used the improved U-Net to segment nine tomato diseases in the PlantVillage tomato dataset and estimated the disease severity, which had an error rate of 11.8%. Zhang et al. [15] constructed a CNN model, which regarded cucumber downy mildew leaves with the background removed as the input, to estimate the severity of cucumber downy mildew and achieved a high level of accuracy (R2 = 0.9190). Gonçalves et al. [16] applied the multi-semantic segmentation method in laboratory-acquired images to evaluate the disease severity. The results indicated that DeepLab V3+ delivered the best performance in disease severity estimation. Lin et al. [17] proposed a semantic segmentation model based on CNN, which was used to segment cucumber powdery mildew images at the pixel level. The model achieved an average pixel accuracy of 96.08%, a cross-merge ratio of 72.11%, and a Dice coefficient of 83.45% using 20 test samples. The advantage of this category of methods is that the classification criteria are usually objective and clear, while the disadvantage is that the complexity of the image background can seriously affect the segmentation accuracy.
In order to reduce the impact of complex backgrounds, Tassisa et al. [18] proposed to use Mask R-CNN to identify the positions of coffee leaves in real production environments first, and then apply U-Net/PSPNet to segment the coffee leaves and disease spots simultaneously. Li et al. [19] further improved the segmentation accuracy of the model using a mixed attention mechanism that combined spatial attention and channel attention, with support from transfer learning. The model was used to automatically estimate the severity of cucumber leaf diseases under field conditions and achieved an R2 of 0.9578. The above studies have, to some extent, reduced the interference of complex backgrounds and achieved satisfactory results. However, segmenting leaves and disease spots simultaneously may affect the segmentation outcome of disease spots due to pixel imbalance, leading to the omission of disease spots. In addition, none of these methods analyzed the overlap of leaves in the collected images in actual production environments. In the process of image collection, the target leaf often overlaps with other leaves, and similar backgrounds can easily lead to the over-segmentation problem. In light of that, the grading of disease severity requires the precise segmentation of leaves and disease spots; thus, we proposed a two-stage segmentation method in this paper as follows:
(1)
An image dataset for cucumber leaf segmentation was constructed, which contained 1140 diseased or healthy cucumber leaf images in complex backgrounds.
(2)
In order to improve the accuracy of disease classification and disease severity grading, we proposed a two-stage model. In the first stage, the diseased leaves were separated from the background, and in the second stage, the diseased spots were separated from the diseased leaves.
(3)
In order to reduce the interference from overlapping leaves and minimize the phenomenon of over-segmentation, the convolutional structure and Transformer were used simultaneously for feature extraction in order to fuse the global and local features. Thus, the information loss caused by down-sampling could be compensated to optimize the effect of leaf edge segmentation.

2. Materials and Methods

2.1. Dataset

This study mainly focused on the task of cucumber disease segmentation in complex environments. The image data were acquired from a self-collected dataset of the Xiaotangshan National Precision Agriculture Research Demonstration Base. Cucumber leaf images were captured via a mobile phone (Huawei P20). Considering the diversity of lighting conditions in practical applications, data were collected at three different time periods every day during the planting season: morning (8:00–10:00), noon (12:00–14:00), and afternoon (15:00–17:00). As shown in Figure 1 and Table 1, the collected images included five categories, namely healthy cucumber leaves, cucumber downy mildew, cucumber anthracnose, cucumber powdery mildew, and cucumber virus disease. The dataset used to train leaf segmentation consists of 1140 images. It is randomly divided into a training set and a test set in a ratio of 8:2, with 912 images used for training and 227 images used for testing. It should be noted that after a leaf is infected with powdery mildew or viral disease, the boundary between the diseased area of the leaf and the healthy area of the leaf is blurred, which makes it extremely difficult to label the boundary of the diseased spot. Therefore, in the training stage of the lesion segmentation model, the dataset we used consists of 405 images, and the disease types include only downy mildew and anthracnose. Cucumber leaf images were cropped from the original size of 2976 × 2976 pixels to 512 × 512 pixels. The Labelme-5.0.0 software was used to label the leaves and disease spots at pixel level and to generate mask images, as shown in Figure 2.

2.2. Two-Stage Model

Due to interference from complex backgrounds and significant differences in the scale of disease spots, it is difficult to achieve the accurate segmentation of cucumber leaves and disease spots using a one-stage segmentation model. Therefore, a two-stage model was designed in this study to decompose a complex task into two simple subtasks. The proposed two-stage model, namely TUNet, consisted of TRNet and U-Net. In the first stage, TRNet was used to segment the target leaf from complex backgrounds. In the second stage, U-Net was used to further segment disease spots from the obtained target leaf. The advantage of two-stage segmentation is that the model only needs to focus on one type of target at each stage (leaf target in the first stage and disease spot target in the second stage). For the two different targets, semantic segmentation models with different structures were selected according to the specific needs of each target, so as to combine the advantages of the two models to improve the segmentation accuracy. The framework of the proposed two-stage model is shown in Figure 3. The structure and key algorithms of the two models used will be described in detail in the following sections.

2.3. First-Stage Segmentation Model TRNet

In 2015, Long et al. [20] proposed the full convolutional network (FCN) by removing the full connection layer. This was the first time that a CNN realized pixel-by-pixel classification on images. The semantic segmentation networks emerged later, such as the U-Net and DeepLab series, and were all based on the FCN method [21,22,23,24,25]. CNN has a strong advantage in obtaining local features. It can forcibly capture this kind of local structure by utilizing local receptive fields, shared weight, and spatial subsampling [26]. In addition, the hierarchical structure of a convolutional kernel takes into account different levels of complexity in local spatial contexts, from simple low-level edges and textures to higher-order semantic patterns [27]. These strengths enable the CNN structure to effectively extract the local features of cucumber leaves in complex backgrounds. However, the CNN has significant limitations in extracting global features due to the loss of spatial resolution in multiple processes of down-sampling. In contrast, a Transformer that can reflect complex spatial transformations and long-distance feature dependencies through a self-attention mechanism would not only completely change the NLP field but also provide new possibilities to the image field [28]. The emergence of the vision Transformer has greatly inspired semantic segmentation, and semantic segmentation models such as SETR have been proposed one after another [29,30]. The SETR model regards the segmentation task as a sequence-sequence prediction task, which uses Transformer as the encoder. In each layer of the encoder, global context modeling is conducted so that the limitations of the CNN in long-distance relationship learning are resolved. Nonetheless, in the field of semantic segmentation, a pure Transformer is not flawless. During the process of using the Transformer for feature extraction, it is necessary to compress two-dimensional image patches into one-dimensional sequences. Unfortunately, the adjacent pixels in space are usually highly correlated, which may lead to the loss of structural information in the patches. Consequently, in the decoding stage, the detailed information cannot be effectively restored through up-sampling, resulting in poor segmentation results.
Therefore, we chose the TRNet model, which combines a Transformer and convolutional structures, to extract features in parallel. This method not only improved the extraction of global features, but also maintained an excellent grasp of low-level details. Thus, it could effectively reduce the interference caused by the overlap of leaves and improve the accuracy of leaf segmentation. The encoder part of TRNet was composed of ResNet50 and a Transformer, as shown in Figure 4.

2.3.1. ResNet50

In this paper, after taking into account the network performance and model size, ResNet50 was chosen as the network for extracting local features [31]. ResNet50 is an architecture based on multi-layer convolution and identity mapping, as shown in Figure 3. For a given image as the input, ResNet50 first conducts a convolution operation and a maximum pooling operation on this image. The subsequent operations consist of four stages, namely Stage 1, Stage 2, Stage 3, and Stage 4, which all start with a Conv Block, followed by different numbers of Identity Block. From Figure 4 and Figure 5, it can be seen that each block contains three layers of convolution. The difference between Conv Block and Identity Block lies in that Conv Block uses the convolution kernel for dimensionality reduction at residual jumps, which can be expressed as:
H ( x ) = F ( x ) + x
H ( x ) = F ( x ) + W ( x )
After each operation, the size of the image is reduced by half, while the number of channels doubles. The final output is X R H 16 × W 16 × 2048 .

2.3.2. Transformer

As shown in Figure 3, since the Transformer module does not require three-dimensional image data, the input image first needs to be transformed into a vector sequence through an embedding layer. Considering that ResNet50 down-samples an input image 16 times, the input sequence length of the Transformer is designed to be H 16 × W 16 × C . The input image is divided into 1024 patches with a size of 16 × 16. Then, each patch is mapped into a one-dimensional vector through linear mapping, which is further processed using position coding. Subsequently, the obtained vector sequence is inputted into the Transformer Encoder for feature learning. From Figure 6, it can be seen that the Transformer Encoder mainly consists of an L-layer Muli-Head Attention (MSA) module and a multi-layer perceptron (MLP) module. As shown in Equations (3) and (4), in the first layer, after the input sequence passing through the Transformer layer, the output can be obtained as follows:
Z l = Z l 1 + M S A ( L N ( Z l 1 ) )
Z l = Z l + M L P ( L N ( Z l ) )
where, the MSA operation is realized by projecting the concatenation of m SA operations, as shown in Equations (5)–(7).
M S A ( Z l 1 ) = C o n c a t ( S A 1 ( Z l 1 ) , S A 2 ( Z l 1 ) , S A m ( Z l 1 ) ) W O
S A ( Z l 1 ) = ( Q K T d ) V
Q = Z l 1 W Q , K = Z l 1 W K , V = Z l 1 W V
where W o R m d × C , W q , W k , W v R C × d are three learnable parameters; d is the dimension of K.
Finally, the features of the Transformer are projected onto the dimension of the number of categories, and the output is X R H 16 × W 16 × 768 .

2.3.3. Decoder

The feature maps of the same scale but with different channels that are outputted by ResNet50 and the Transformer are concatenated and then inputted into the decoder part. The decoder adopts the naive structure in SETR, which consists of two layers of 1 × 1 convolution + BatchNorm + 1 × 1 convolution. The last 1 × 1 convolution maps each component feature vector to the required number of categories. Then, bilinear interpolation up-sampling is performed directly to obtain the output with the same resolution as the original image, that is, X R H × W × n u m c l s .

2.4. Second-Stage Segmentation Model U-Net

In the second stage, considering the multi-scale nature of the disease spots and a relatively small sample size of the dataset, we chose the U-Net structure, which uses ResNet50 as the backbone, to segment disease spots (see Figure 7 for the network structure). U-Net is a model based on an encoder-decoder structure, consisting of two parts, where ResNet50 serves as the encoder (details in Section 2.3.1). The structure of U-Net is symmetric. After down-sampling an image 32 times, up-sampling is conducted. After each round of up-sampling, the sample is fused at the same scale as the number of channels corresponding to the feature extraction part. Since the feature map at the top level of the network has a smaller down-sampling factor, more details can be retained, resulting in a more detailed feature map. On the contrary, the feature map at the bottom of the network loses a lot of information during the down-sampling process due to a larger down-sampling factor, resulting in a significant spatial loss. However, this also results in a high concentration of information, which is conducive to the determination of the target region in order to effectively retain the detailed information in the image.

2.5. Disease Severity Grading Method

Steps for Disease Severity Grading. Cucumber diseases of different severity levels require different prevention and control methods and different amounts of pesticide usage in order to avoid affecting the ecological environment and food safety due to excessive pesticide spraying or ineffective disease control due to insufficient pesticide usage. In this study, we calculated the ratio of the total area of disease spots on each leaf to the area of the entire leaf by segmenting the target leaf and the disease spots separately, which was used as the basis of disease severity grading. The specific steps are as follows:
Step 1: The leaf and complex backgrounds in the image are considered as the targets. Then, the complex backgrounds in the manually labeled mask image are removed to obtain a complete leaf.
Step 2: The mask image obtained in Step 1 is taken as the input of the second stage, which is segmented to obtain disease spots.
Step 3: The ratio of the total area of disease spots to the area of the entire leaf is calculated according to Equation (8). Then, this ratio is compared with the disease severity grading standard to derive the final grading result.
P = S D i s e a s e S L e a f × 100 %
where S L e a f refers to the area of the leaf after segmentation; S D i s e a s e refers to the total area of disease spots after segmentation; and p refers to the ratio of the total area of disease spots to the area of the entire leaf.
Disease Severity Grading Standard. Referring to the relevant disease severity grading standards and suggestions from plant protection experts, the severity of cucumber diseases was classified into five levels in this study, as shown in Table 2.

3. Experiment and Analysis

The hardware configuration for training and testing was as follows: Intel Xeon (R) Gold 6248 CPU @ 3.00 GHz × 96; 256 GB Memory; and NVIDIA GeForce RTX 3090 Graphics Card. The software configuration was as follows: 64-bit Ubuntu 20.04.5 LTS operating system; CUDA Version 11.4; and Pytorch Version 1.13.0. In order to ensure a fair comparison, the hyperparameters of each network were uniformly configured. After repeated trial and error, the hyperparameters were determined as shown in Table 3.

3.1. Evaluation Indicators

The performance of the model is evaluated using four evaluation metrics: Pixel Accuracy (PA), IoU, Dice, and Recall. Pixel accuracy represents the proportion of correctly predicted pixels to the total pixels. IoU is used to calculate the ratio of the intersection and union of the two sets of true values and predicted values for each category. The calculation of PA and IoU is as follows:
P A = i = 0 K P i i i = 0 K j = 0 K P i j
I o U = p i i j = 0 k p i j + j = 0 k p j i p i i
where, Pij refers to the total number of i pixels predicted as j pixels; Pii refers to the total number of i pixels predicted as i pixels, i.e., the total number of correctly classified pixels. The k value for each stage in the two-stage model is 1. Specifically, in the first stage of the two-stage model, k = 1 represents leaf, while in the second stage, it represents lesion.
Dice is usually used to calculate the similarity between two samples, and the value range is [0, 1]. A dice value close to 1 indicates that the set similarity is high, that is, the segmentation effect between the target and the background is better. A dice value close to 0 indicates that the target cannot be effectively segmented from the background. Recall is the ratio between the number of samples correctly predicted as positive classes and the total number of positive classes. Dice and Recall are calculated as follows:
D i c e = ( 2 × T P ) F N + F P + ( 2 × T P )
R e c a l l = T P F N + T P
where TP represents the true positive example, FP represents the false positive example, and FN represents the false negative example.

3.2. Comparison of Different Segmentation Models

To verify the effectiveness of TRNet and U-Net(ResNet50), the U-Net, U-Net(MobileNet), DeepLabV3+(ResNet50), DeepLabV3+(MobileNet), SETR, and PSPNet(ResNet50) were chosen as control models for the first stage and second stage in this study, and comparisons of the results are shown in Table 4 and Table 5 [32,33]. All the above models were implemented on the created dataset. The weight file with the best training effect was saved and used for testing, and the mask image acquired from the test was extracted and put onto the original image to obtain the segmentation result. The quantitative results were tabulated in a table format and were visualized in the form of renderings.
Disease images collected in a production environment have problems with overlapping leaves and complex backgrounds, which makes it difficult to separate leaves from the background. In order for the model to accurately segment the target blade, the model must take into account the global features while paying attention to the local features. TRNet combines the advantages of the Transformer and the convolutional neural network. The Transformer’s ability to control the global features allows the model to better focus on the entire image, improves the attention weight of the target leaves, and reduces segmentation errors caused by complex backgrounds. At the same time, the focus on local features makes TRNet equally sensitive to detailed features in the target leaves. Therefore, the TRNet network achieves the best segmentation performance of leaves and lesions, with a PA of 93.94%, an IoU of 96.86%, a Dice coefficient of 72.25%, and a Recall of 98.60%. Compared with the SETR model using the Transformer as the encoder, the PA was improved by 2.38%, the IoU was improved by 4.25%, the Dice coefficient was improved by 1.13%, and the Recall was improved by 2.46%. Among the segmentation networks using convolutional networks as encoders, DeepLabV3+(ResNet50) achieved the highest metrics, which were 92.90%, 95.49%, 71.65%, and 97.42% for the PA, IoU, Dice coefficient, and Recall, respectively. The PA, IoU, Dice coefficient, and Recall for TRNet increased by 1.04%, 1.37%, 0.6% and 1.18%, respectively, compared to DeepLabV3+(ResNet50). It can be seen that the segmentation performance of TRNet was significantly improved. It further shows that the combination of the Transformer and the CNN was effective.
In the second-stage task, the model needed to extract complete disease spots from the target leaf, which required the model to extract finer features. Since ResNet50 is deeper and wider than the original U-Net encoder, it can extract more comprehensive disease spot information. Therefore, in the fine segmentation of lesions, U-Net, using ResNet50 as the feature extraction network, achieved an optimal performance, with the IoU, Dice coefficient, and Recall reaching 52.52%, 68.14%, and 73.46% respectively, which are better results than those obtained with the original U-Net. The improvements in the IoU, Dice coefficient, and Recall were 2.87%, 3.14%, and 7.45%, respectively, which were 8.04%, 7.88%, and 14.63% higher than the Transformer-based SETR network. The proposed TRNet model had a slight negative impact on the fine segmentation of lesions because the Transformer branch extracted global features, so the indicators of this model were slightly lower than U-Net (ResNet50).
To further demonstrate the superiority of TRNet and U-Net(ResNet50), we visualized the first-stage and second-stage segmentation results, as shown in Figure 8. It can be seen that, in the first stage, models based on the CNN could completely segment the target leaf but were inevitably affected by complex backgrounds, resulting in over-segmentation, more or less. The SETR model, which is purely based on the Transformer as the feature extractor, was obviously less affected by overlapping leaves. This is largely because that Transformer mainly focuses on global features. On the other hand, the SETR model was significantly weaker than CNN-based models in extracting local features of the cucumber leaf. TRNet, which combines the advantages of both, could more completely segment the target leaf from complex backgrounds and had less interference from environmental factors.
In the second stage, the image containing disease spots has a simple background without external interference such that the attention to local features becomes more important. Except for the original U-Net and U-Net(ResNet50), all the other models mistakenly segmented the connection between two adjusted disease spots, while U-Net also ignored some minor disease spots. It can be seen that the U-Net model had a significant advantage in fusing multi-scale features for the segmentation of small disease spots. Moreover, ResNet50, as a feature extractor, could provide the precise extraction of the local features. Overall, it was found that TRNet and U-Net(ResNet50) achieved the best performance on the test set compared with the control models. Therefore, the latter part of this paper focuses on the fusion of these two models.

3.3. Comparison of Model Fusion Methods

In this paper, the method we proposed consisted of the segmentation of the complete leaf from the complex backgrounds first, followed by the segmentation of disease spots from a target leaf that is in a simple background to eventually achieve disease severity grading. The intention of two-stage segmentation was not only to remove complex interference factors but also to utilize the complementary advantages of different models to improve the segmentation accuracy. Therefore, the fusion of appropriate models was crucial. In this study, TRNet and U-Net(ResNet50), which delivered the best performance in the first and second stages, respectively, were used for leaf segmentation, and the extracted mask map was further processed to segment the disease spots. To verify the advantage of the TUNet model, we also chose the models delivering the second-best performance in the first stage and second stage, i.e., DeepLabV3+(ResNet50) and TRNet, and fused them with the best performers. In the end, four combination schemes were formed for comparative analysis.
As shown in Table 6, Scheme 1 used TRNet for the segmentation in both stages. Scheme 2 used TRNet in the first stage and U-Net(ResNet50) in the second stage. Scheme 3 used DeepLabV3+(ResNet50) in the first stage and TRNet in the second stage. Scheme 4 used DeepLabV3+(ResNet50) in the first stage and U-Net(ResNet50) in the second stage.
A comparison of the results is shown in Table 7. It can be seen that the indicators of the fusion model on these two categories were similar, which is because the lesions of cucumber downy mildew and cucumber anthracnose were similar. Among the two diseases, the performance of Scheme 1 was slightly better than Scheme 3 and Scheme 4. This is because TRNet is a first-stage model and the leaf segmentation was more accurate. Scheme 2 outperformed all other fusion schemes and performed better on all metrics (PA, IoU, Dice coefficient, or recall). It was also noted that all the indicators of Scheme 1, Scheme 2, and Scheme 4 were lower than they were before fusion, and only Scheme 2 yielded higher values for all the indicators after fusion compared to before. Compared with the situation in which the other combinations were declining, the integrated advantages of the two models in scenario 2 can be fully reflected.
The segmentation results of the various fusion schemes are shown in Figure 9. It can be seen that Scheme 3 and Scheme 4, which used DeepLabV3+ for segmentation in the first stage, mistakenly segmented some leaves with similar colors as the target leaf, resulting in the segmentation of disease spots from non-target leaves in the second stage. Therefore, the final accuracy was reduced. For Scheme 1 and Scheme 2, the TRNet model performed well in the first stage and fully segmented the contour of the target leaf. However, for disease spots of varying sizes, the multi-scale segmentation of U-Net apparently outperformed other schemes. Based on the advantages and disadvantages of the four schemes and the actual production needs, Scheme 2 was ultimately chosen as the cucumber disease segmentation model in this study.

3.4. Two-Stage Model

Considering that the segmenting of leaves and disease spots from complex backgrounds simultaneously with a one-stage model is extremely challenging, we proposed a two-stage segmentation method in this paper. Specifically, purpose of the first stage was to remove complex backgrounds and the purpose of the second stage was to segment the disease spots under a simple background. In order to verify the improvement of the proposed two-stage model as compared to one-stage segmentation, we chose U-Net(ResNet50), which delivered the best performance for disease spot segmentation, to extract disease spots from complex and simple backgrounds.
The segmentation results are shown in Figure 10. It can be seen that the results obtained from two-stage segmentation were far better than those obtained from one-stage segmentation. In the three images shown in Figure 10, some disease spots on non-target leaves were mistakenly segmented using one-stage segmentation. This is because one-stage segmentation does not remove confounding factors, such as overlapping leaves before disease spot segmentation, leading to poor segmentation results. Therefore, in this study, a two-stage model was proposed for disease severity grading in order to guarantee a high classification accuracy.

3.5. Disease Severity Grading

At present, there is no unified standard for the severity grading of cucumber downy mildew. According to the relevant literature, commonly used methods for the severity grading of cucumber downy mildew are mainly based on (1) the ratio of the total area of disease spots to the area of the entire leaf and (2) the number of disease spots per unit leaf area. In this study, the first method was adopted. The disease severity was divided into five levels, as detailed in Section 3.4. Figure 11 shows the images of cucumber downy mildew and cucumber anthracnose from severity Level 1 to Level 5.
We used TRNet and U-Net to segment the target leaf and disease spots, respectively, and calculated the ratio of the pixel area of disease spots to the pixel area of the leaf. Then, the severity of cucumber downy mildew and cucumber anthracnose was graded according to the specified grading standard. In this study, 90 cucumber downy mildew images and 94 cucumber anthracnose images were selected as test objects, and the predicted disease severity was compared with the manually labelled severity to evaluate the classification accuracy of the model. The experimental results are shown in Table 8 and Table 9. It can be seen from Table 8 that the classification accuracy of cucumber downy mildew from Levels 1, 2, 3, 4, and 5 was 100.00%, 100.00%, 94.44%, 92.31%, and 85.71%, respectively, with an average accuracy of 94.49%. According to Table 9, the classification accuracy of cucumber anthracnose from Levels 1, 2, 3, 4, and 5 was 100%, 96%, 100.00%, 92.85% and 83.33%, respectively, with an average accuracy of 94.43%. In general, the model had a high prediction accuracy for disease severity for Levels 1 to 3 but performed suboptimally for Levels 4 and 5. This is because the edges of leaves with Level 4–5 cucumber downy mildew or cucumber anthracnose were mostly withered, and the model might recognize such edges as background factors in the first-stage segmentation, resulting in a lower accuracy.
A comparison of the results of the proposed model TUNet and the existing models is shown in Table 10. Ref. [34] uses the two-stage method DUNet to segment diseased leaves and lesions, and Ref. [13] uses an improved U-Net model to simultaneously segment leaves and lesions. As can be seen in Table 10, TUNet has a higher accuracy in disease severity grading compared to Ref. [34]. The one-stage model in Ref. [13] has a speed advantage, but the accuracy is much lower than the two-stage model.
As can be seen in Figure 12, both Refs. [13,34] have problems with over-segmentation, that is, the lesions on the edge of the leaves are classified as background, resulting in an incorrect classification of disease severity. DUNet failed to segment lesions due to the incorrect segmentation of leaves in the first stage, resulting in an incorrect input in the second stage, which illustrates the importance of the first-stage model in the two-stage method. Our method adds global features to the first-stage model for context modeling so that it can correctly determine whether the edge lesion is part of the leaf, thus avoiding the over-segmentation problem. However, TUNet still has shortcomings in the segmentation of small lesions, which needs further improvement.

4. Conclusions

This paper proposed a two-stage model, namely TUNet, for grading the severity of cucumber leaf diseases. The proposed model consisted of two segmentation networks, TRNet and U-Net. In the first stage, we chose TRNet to extract the target cucumber leaf from the image. The TRNet network uses both a convolutional structure and a Transformer to extract image features, so it can compensate for the global loss caused by down-sampling in the convolutional structure. The combination of global and local information not only improved the segmentation accuracy of the target leaf but also effectively reduced the impact of complex backgrounds on the segmentation task. Then, the segmented leaf image with a simple background was used as the input of the second-stage segmentation, and U-Net, which uses ResNet50 as the backbone network, was chosen to extract disease spots from the image. We found that when ResNet50 was used as the backbone network, it could accurately detect and segment very small objects, which is conducive to disease spot segmentation. Further, we compared these two models with several classic models. The experimental results showed that these two networks outperformed other models in leaf segmentation and disease spot segmentation, and the fusion of the two yielded more effective results. Finally, the cucumber disease severity was graded by calculating the ratio of the total area of disease spots to the area of the entire leaf. The results showed that the two-stage model proposed in this study performed well with regard to the grading of the severity of cucumber downy mildew and cucumber anthracnose under real production environments. It is worth noting that our approach also has limitations. First of all, the proposed TRNet model takes a long time to infer, and this time loss cannot be ignored. Therefore, future research should focus on the light weight of the model structure in order to shorten the segmentation time. Secondly, in addition to the proportion of the lesion area, the disease severity classification needs to consider the color of the lesions and whether the diseased leaves are perforated. Therefore, a more accurate classification of disease severity requires a comprehensive consideration of multiple factors mentioned above.

Author Contributions

H.Y.: Writing—Original draft preparation; C.W.: Methodology; L.Z.: Writing—Reviewing and Editing; J.L.: Investigation; B.L.: Software; F.L.: Data curation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under grant number 62106065 and, in part, by the Natural Science Foundation of Hebei province under grant number F2022204004.

Data Availability Statement

If anyone needs data, please contact us.

Acknowledgments

We are grateful to our colleagues at the Hebei Key Laboratory of Agricultural Big Data, the National Engineering Research Center for Information Technology in Agriculture and Institute of Vegetables and Flowers, and the Chinese Academy of Agricultural Sciences for their help and input, without which this study would not have been possible.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Food and Agriculture Organisation of the United States. Food and Agriculture Data. Available online: http://www.fao.org/faostat/en/#home (accessed on 15 July 2021).
  2. Yang, L.; Yu, X.; Zhang, S.; Long, H.; Zhang, H.; Xu, S.; Liao, Y. GoogLeNet based on residual network and attention mechanism identification of rice leaf diseases. Comput. Electron. Agric. 2023, 204, 107543. [Google Scholar] [CrossRef]
  3. Gulzar, Y. Fruit image classification model based on MobileNetV2 with deep transfer learning technique. Sustainability 2023, 15, 1906. [Google Scholar] [CrossRef]
  4. Kabir, M.M.; Ohi, A.Q.; Mridha, M.F. A Multi-Plant Disease Diagnosis Method Using Convolutional Neural Network. arXiv 2020, arXiv:2011.05151. [Google Scholar]
  5. Zhang, S.; Zhang, C. Modified U-Net for plant diseased leaf image segmentation. Comput. Electron. Agric. 2023, 204, 107511. [Google Scholar] [CrossRef]
  6. Bhagat, S.; Kokare, M.; Haswani, V.; Hambarde, P.; Kamble, R. Eff-UNet++: A novel architecture for plant leaf segmentation and counting. Ecol. Inform. 2022, 68, 101583. [Google Scholar] [CrossRef]
  7. Gulzar, Y.; Ünal, Z.; Aktaş, H.; Mir, M.S. Harnessing the Power of Transfer Learning in Sunflower Disease Detection: A Comparative Study. Agriculture 2023, 13, 1479. [Google Scholar] [CrossRef]
  8. Esgario, J.G.M.; Krohling, R.A.; Ventura, J.A. Deep learning for classification and severity estimation of coffee leaf biotic stress. Comput. Electron. Agric. 2020, 169, 105162. [Google Scholar] [CrossRef]
  9. Liang, Q.; Xiang, S.; Hu, Y.; Coppola, G.; Zhang, D.; Sun, W. PD2SE-Net: Computer-assisted plant disease diagnosis and severity estimation network. Comput. Electron. Agric. 2019, 157, 518–529. [Google Scholar] [CrossRef]
  10. Hu, G.; Wang, H.; Zhang, Y.; Wan, M. Detection and severity analysis of tea leaf blight based on deep learning. Comput. Electr. Eng. 2021, 90, 107023. [Google Scholar] [CrossRef]
  11. Pan, J.; Xia, L.; Wu, Q.; Guo, Y.; Chen, Y.; Tian, X. Automatic strawberry leaf scorch severity estimation via faster R-CNN and few-shot learning. Ecol. Inform. 2022, 70, 101706. [Google Scholar] [CrossRef]
  12. Dhiman, P.; Kukreja, V.; Manoharan, P.; Kaur, A.; Kamruzzaman, M.M.; Dhaou, I.B.; Iwendi, C. A Novel Deep Learning Model for Detection of Severity Level of the Disease in Citrus Fruits. Electronics 2022, 11, 495. [Google Scholar] [CrossRef]
  13. Chen, S.; Zhang, K.; Zhao, Y.; Sun, Y.; Ban, W.; Chen, Y.; Zhuang, H.; Zhang, X.; Liu, J.; Yang, T. An Approach for Rice Bacterial Leaf Streak Disease Segmentation and Disease Severity Estimation. Agriculture 2021, 11, 420. [Google Scholar] [CrossRef]
  14. Wspanialy, P.; Moussa, M. A detection and severity estimation system for generic diseases of tomato greenhouse plants. Comput. Electron. Agric. 2020, 178, 105701. [Google Scholar] [CrossRef]
  15. Zhang, L.-x.; Tian, X.; Li, Y.-x.; Chen, Y.-q.; Chen, Y.-y.; Ma, J.-c. Estimation of Disease Severity for Downy Mildew of Greenhouse Cucumber Based on Visible Spectral and Machine Learning. Spectrosc. Spectr. Anal. 2020, 40, 227–232. [Google Scholar]
  16. Gonçalves, J.P.; Pinto, F.A.C.; Queiroz, D.M.; Villar, F.M.M.; Barbedo, J.G.A.; Del Ponte, E.M. Deep learning architectures for semantic segmentation and automatic estimation of severity of foliar symptoms caused by diseases or pests. Biosyst. Eng. 2021, 210, 129–142. [Google Scholar] [CrossRef]
  17. Lin, K.; Gong, L.; Huang, Y.; Liu, C.; Pan, J. Deep Learning-Based Segmentation and Quantification of Cucumber Powdery Mildew Using Convolutional Neural Network. Front. Plant Sci. 2019, 10, 155. [Google Scholar] [CrossRef] [PubMed]
  18. Tassis, L.M.; de Souza, J.E.T.; Krohling, R.A. A deep learning approach combining instance and semantic segmentation to identify diseases and pests of coffee leaves from in-field images. Comput. Electron. Agric. 2022, 193, 106732. [Google Scholar] [CrossRef]
  19. Li, K.; Zhang, L.; Li, B.; Li, S.; Ma, J. Attention-optimized DeepLab V3 + for automatic estimation of cucumber disease severity. Plant Methods 2022, 18, 109. [Google Scholar] [CrossRef]
  20. Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv 2015, arXiv:1411.4038. [Google Scholar]
  21. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland; Volume 9351, pp. 234–241.
  22. Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv 2016, arXiv:1412.7062. [Google Scholar]
  23. Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. arXiv 2017, arXiv:1606.00915. [Google Scholar] [CrossRef] [PubMed]
  24. Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
  25. Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv 2018, arXiv:1802.02611. [Google Scholar]
  26. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  27. Wu, H.; Xiao, B.; Codella, N.; Liu, M.; Dai, X.; Yuan, L.; Zhang, L. CvT: Introducing Convolutions to Vision Transformers. arXiv 2021, arXiv:2103.15808. [Google Scholar]
  28. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
  29. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  30. Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.S.; et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. arXiv 2020, arXiv:2012.15840. [Google Scholar]
  31. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
  32. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  33. Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. arXiv 2017, arXiv:1612.01105. [Google Scholar]
  34. Wang, C.; Du, P.; Wu, H.; Li, J.; Zhao, C.; Zhu, H. A cucumber leaf disease severity classification method based on the fusion of DeepLabV3+ and U-Net. Comput. Electron. Agric. 2021, 189, 106373. [Google Scholar] [CrossRef]
Figure 1. Samples of cucumber leaf images.
Figure 1. Samples of cucumber leaf images.
Agronomy 14 00072 g001
Figure 2. Image segmentation labels.
Figure 2. Image segmentation labels.
Agronomy 14 00072 g002
Figure 3. TUNet framework diagram.
Figure 3. TUNet framework diagram.
Agronomy 14 00072 g003
Figure 4. TRNet structure diagram.
Figure 4. TRNet structure diagram.
Agronomy 14 00072 g004
Figure 5. Two types of residual modules: (a) Identity block and (b) Conv block.
Figure 5. Two types of residual modules: (a) Identity block and (b) Conv block.
Agronomy 14 00072 g005
Figure 6. Structure of Transformer Encoder.
Figure 6. Structure of Transformer Encoder.
Agronomy 14 00072 g006
Figure 7. U-Net network structure diagram.
Figure 7. U-Net network structure diagram.
Agronomy 14 00072 g007
Figure 8. Visualization of model segmentation.
Figure 8. Visualization of model segmentation.
Agronomy 14 00072 g008
Figure 9. The results of the fusion scenario are visualized.
Figure 9. The results of the fusion scenario are visualized.
Agronomy 14 00072 g009
Figure 10. Comparison of segmentation results between one-stage and two-stage models.
Figure 10. Comparison of segmentation results between one-stage and two-stage models.
Agronomy 14 00072 g010
Figure 11. Example plot of downy mildew and anthracnose severity ratings in cucumbers.
Figure 11. Example plot of downy mildew and anthracnose severity ratings in cucumbers.
Agronomy 14 00072 g011
Figure 12. A comparison of the results of segmentation.
Figure 12. A comparison of the results of segmentation.
Agronomy 14 00072 g012
Table 1. Number of disease species.
Table 1. Number of disease species.
Type of DiseaseQuantity
Cucumber downy mildew290
Cucumber anthracnose115
Cucumber powdery mildew185
Cucumber virus disease336
Healthy leaf214
Total1140
Table 2. Disease severity grading standard.
Table 2. Disease severity grading standard.
Disease GradeProportion of Disease Spots
Level 0p = 0%
Level 10 < p ≤ 5%
Level 25% < p ≤ 10%
Level 310% < p ≤ 25%
Level 425% < p ≤ 50%
Level 550% < p ≤ 100%
Table 3. Configuration of hyperparameters.
Table 3. Configuration of hyperparameters.
Learning Rate1 × 10−4
epoch200
batch size4
optimizerAdam
epoch200
batch size4
Table 4. Comparison of first stage model.
Table 4. Comparison of first stage model.
First Stage ModelPA%IoU%Dice%Recall%
DeepLabV3+(ResNet50)92.9092.4895.6396.02
DeepLabV3+(MobileNet)92.7892.2295.5095.67
U-Net91.9189.3593.6795.38
U-Net(ResNet50)92.8692.2395.5095.67
U-Net(MobileNet)92.9489.3593.6995.38
PSPNet(ResNet50)92.8892.3695.4895.90
TRNet(ours)93.9494.2096.9896.91
SETR91.5689.6593.9995.65
Table 5. Comparison of second stage model.
Table 5. Comparison of second stage model.
Second Stage ModelIoU%Dice%Recall%
DeepLabV3+(ResNet50)37.9453.3671.68
DeepLabV3+(MobileNet)45.2157.3655.07
U-Net49.6465.0066.01
U-Net(ResNet50)52.5168.1473.46
U-Net(MobileNet)48.4064.1669.20
PSPNet(ResNet50)50.8166.4465.87
TRNet(ours)52.3367.8768.44
SETR44.4760.2658.83
Table 6. Model fusion schemes.
Table 6. Model fusion schemes.
SchemeFirst-Stage ModelSecond-Stage Model
1TRNetTRNet
2TRNetU-Net(ResNet50)
3DeepLabV3+(ResNet50)TRNet
4DeepLabV3+(ResNet50)U-Net(ResNet50)
Table 7. Results of model fusion schemes.
Table 7. Results of model fusion schemes.
Cucumber Downy MildewCucumber Anthracnose
SchemeIoU%Dice%Recall%IoU%Dice%Recall%
151.2467.3467.3452.0867.9869.44
254.1268.7975.9754.4468.8974.34
351.0367.1167.0651.9467.3269.28
452.3367.9773.9552.0267.6472.37
Table 8. Disease severity grading results of cucumber downy mildew.
Table 8. Disease severity grading results of cucumber downy mildew.
Disease GradeNumber of DatasetsCorrect GradingAccuracy/%
Level 12323100.00
Level 22222100.00
Level 3181794.44
Level 4131292.31
Level 5141285.71
Table 9. Disease severity grading results of cucumber anthracnose.
Table 9. Disease severity grading results of cucumber anthracnose.
Disease GradeNumber of DatasetsCorrect GradingAccuracy/%
Level 12222100.00
Level 2252496.00
Level 32121100.00
Level 4141392.85
Level 5121083.33
Table 10. Comparison of results of cucumber downy mildew and anthracnose grading.
Table 10. Comparison of results of cucumber downy mildew and anthracnose grading.
PaperLevel 1Level 2Level 3Level 4Level 5
DUNet [34]91.10%95.72%92.06%89.01%80.95%
BLSNet [13]88.93%87.45%89.68%81.31%73.21%
TUNet100%98%97.22%92.58%84.52%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yao, H.; Wang, C.; Zhang, L.; Li, J.; Liu, B.; Liang, F. A Cucumber Leaf Disease Severity Grading Method in Natural Environment Based on the Fusion of TRNet and U-Net. Agronomy 2024, 14, 72. https://doi.org/10.3390/agronomy14010072

AMA Style

Yao H, Wang C, Zhang L, Li J, Liu B, Liang F. A Cucumber Leaf Disease Severity Grading Method in Natural Environment Based on the Fusion of TRNet and U-Net. Agronomy. 2024; 14(1):72. https://doi.org/10.3390/agronomy14010072

Chicago/Turabian Style

Yao, Hui, Chunshan Wang, Lijie Zhang, Jiuxi Li, Bo Liu, and Fangfang Liang. 2024. "A Cucumber Leaf Disease Severity Grading Method in Natural Environment Based on the Fusion of TRNet and U-Net" Agronomy 14, no. 1: 72. https://doi.org/10.3390/agronomy14010072

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop