Image Segmentation Method for Sweetgum Leaf Spots Based on an Improved DeeplabV3+ Network

Cai, Maodong; Yi, Xiaomei; Wang, Guoying; Mo, Lufeng; Wu, Peng; Mwanza, Christine; Kapula, Kasanda Ernest

doi:10.3390/f13122095

Open AccessArticle

Image Segmentation Method for Sweetgum Leaf Spots Based on an Improved DeeplabV3+ Network

by

Maodong Cai

,

Xiaomei Yi

^*,

Guoying Wang

,

Lufeng Mo

,

Peng Wu

,

Christine Mwanza

and

Kasanda Ernest Kapula

College of Mathematics & Computer Science, Zhejiang A & F University, Hangzhou 311300, China

^*

Author to whom correspondence should be addressed.

Forests 2022, 13(12), 2095; https://doi.org/10.3390/f13122095

Submission received: 13 September 2022 / Revised: 29 November 2022 / Accepted: 1 December 2022 / Published: 8 December 2022

(This article belongs to the Special Issue Ecology and Management of Forest Pests)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper discusses a sweetgum leaf-spot image segmentation method based on an improved DeeplabV3+ network to address the low accuracy in plant leaf spot segmentation, problems with the recognition model, insufficient datasets, and slow training speeds. We replaced the backbone feature extraction network of the model’s encoder with the MobileNetV2 network, which greatly reduced the amount of calculation being performed in the model and improved its calculation speed. Then, the attention mechanism module was introduced into the backbone feature extraction network and the decoder, which further optimized the model’s edge recognition effect and improved the model’s segmentation accuracy. Given the category imbalance in the sweetgum leaf spot dataset (SLSD), a weighted loss function was introduced and assigned to two different types of weights, for spots and the background, respectively, to improve the segmentation of disease spot regions in the model. Finally, we graded the degree of the lesions. The experimental results show that the PA, mRecall, and mIou algorithms of the improved model were 94.5%, 85.4%, and 81.3%, respectively, which are superior to the traditional DeeplabV3+, Unet, Segnet models and other commonly used plant disease semantic segmentation methods. The model shows excellent performance for different degrees of speckle segmentation, demonstrating that this method can effectively improve the model’s segmentation performance for sweetgum leaf spots.

Keywords:

disease spot image segmentation; backbone feature extraction network; attention mechanism; weighted loss function; disease spot grading

1. Introduction

Sweetgum trees and deciduous trees have very high application value, particularly for maintaining ground strength, for ornamental use, as raw material, and medicinal purposes [1,2]. As the planting area for this tree has continuously expanded, the leaf spot disease of the sweetgum tree has gradually aggravated, characterized by a brown color and a small spot area. Generally, the grade of leaf spot disease is determined by the area of the spots on the leaves. Using deep learning to segment leaves enables a quick evaluation of the disease situation, which is of great significance for garden protection.

In recent years, many detection and classification technologies for plant diseases have emerged. Nandini et al. [3] described different image segmentation technologies, such as edge-detection technology, threshold technology, and clustering technology, which continue to have good potential for detecting diseases on plant leaves. Kumar et al. [4] proposed a method to classify three important leaf-surface diseases in bananas using local texture characteristics and used a ten-fold cross-verification process to complete a comparative performance analysis. Jothiaruna et al. [5] proposed a leaf-spot segmentation method for use in the real environment using comprehensive color characteristics and a regional growth method. These methods divide the target area from texture, color, shape, and other information. The preprocessing of traditional methods is very targeted, but plant diseases usually show a variety of characteristics. The artificial extraction of features includes a certain subjectivity, which brings great challenges to traditional plant detection and recognition algorithms.

The emergence of deep learning provides a new technology for plant disease identification and detection. Deep neural networks have a strong ability to learn and self-train. They can obtain image information of plant diseases from multiple angles, which offers advantages for plant disease detection and prevention [6,7,8,9]. Lakshmi et al. [10] proposed an effective deep-learning framework for the automatic detection and segmentation of plant diseases using an improved CNN based on the pixel-level mask area. Mobeen et al. [11] used CNN to classify the symptoms of plant diseases and proposed a phased migration learning method, which can help the model accelerate convergence. Liu Xinda et al. [12] solved the problem of plant disease identification by weighting visual areas and losses and using the LSTM network to encode the weighted patch feature sequence into a complete feature representation. Prabhjot Kaur et al. [13] proposed a neural-network-based technology, Modified InceptionResNet-V2 (MIR-V2), combined with migration learning methods to detect plaques in leaves of different tomato plants. The model obtained a 98.92% accuracy in the detection of lesions. Dang Huu Chau et al. [14] used the Yolo-v5 model, which was based on the deep-learning method, and their proposed stability information was sourced from an automatic encoder to train Plant Village datasets. After training, the detection accuracy for plaques was 81.28%. Hongbo Yuan et al. [15] proposed an improved DeepLabV3+ deep-learning network for the segmentation of grape leaf black rot spots. They used the ResNet101 network as the backbone network of DeepLabV3+, inserted a channel attention module in the residual module, and added a feature fusion branch based on the feature pyramid network to the encoder, integrating different levels of feature diagrams. The improved network evaluation indicators were greatly improved, enhancing the segmentation of grape leaf black rot spots. Hu et al. [16] used a CNN to divide tea and plaques and evaluate the degree of damage. Liang et al. [17] used a PD2SE-Net neural network to segment plant plaque areas and assess their damage, reporting an accuracy of more than 91%. The DeepLabV3+ model is mainly suitable for satellite images, medical images, urban scenes, plant branch segmentation, and element extraction [18,19,20,21,22], but there are not many applications for plant diseases. Therefore, use of a deep-learning network on plant disease data images may be a fruitful, unexplored approach for the identification and detection of target diseases.

In 2018, Liang-Chieh Chen et al. [23] proposed the semantic segmentation network model DeepLabV3+. The network includes a decoder module in the original DeepLabV3 network to refine the semantic segmentation effect, and it is currently the best-performing network in the DeepLab Network series. The encoder part of DeepLabV3+ uses serial cavity convolution in the Modified Aligned Xception backbone feature extraction network and divides the results into two parts, one of which is differently expanded. The atrous spatial pyramid pooling (ASPP) model is composed of a hollow convolution of expansion coefficients to encode, stitch, and fuse image context information and carry out a 1 × 1 convolution compression. This information is then directly input into the decoder part, and the feature layer is selected to be directly passed into the decoder before the pooling of the Block2 module in the Modified Aligned Xception. One part of the decoder inputs the shallow feature that was output from the backbone feature extraction network into the 1 × 1 convolution to reduce the dimensions of the feature map; the other part uses fourfold bilinear interpolation sampling to obtain a feature diagram whose features are compressed by ASPP and 1 × 1 convolution. It then stitches the two feature diagrams to capture shallow, cross-layer connections. Layer features carry detailed information, thus enriching the semantic and detailed information of the image. Finally, the merged feature map is sampled four times to obtain a semantic segmentation map with the same size as the original picture.

Our team developed a method to improve the plaque segmentation accuracy of maple leaves by combining the attention mechanism module and DeeplabV3+ Network. Maple-leaf spot images captured by the East Lake Campus of Zhejiang Agricultural and Forestry University in 2021 were used as the research object for lesion segmentation. The experimental results show that the model proposed in this paper can satisfy the requirements of pixel accuracy, mean recall rate, mean intersection over union, and other aspects and has a good segmentation effect on leaf spots.

2. Materials and Methods

2.1. Production of the Dataset

In this experimental study, the maple leaf spot images came from the maple forest on the East Lake Campus of Zhejiang Agriculture and Forestry University. The research area is located in the northwest of Zhejiang Province. It is located from 118°51′ to 119°52′ east longitude and 29°56′ to 30°23′ north latitude. It has a monsoon climate, warm and humid and sunny and humid, and is suitable for the growth of maple trees. The environmental background, natural light, shooting equipment, etc., will affect the images taken of maple trees. The image collection time was distributed from May to July 2021. The images were taken and collected in the morning, noon, and evening. At the same time, the dataset also includes maple spot images on cloudy and sunny days, which can better reflect the real observation conditions. In this experiment, maple leaf lesions were taken with an iPhone 7’s built-in camera, and 160 valid images were screened.

The image semantic segmentation annotation tool Labelme was used to annotate the spot images at the pixel level. The annotation data were stored in json format, and the data labels were converted to a binary png image using the labelme_json_to__dataset command. The black part is the background and the red part is the leaf spot, as shown in Figure 1.

In this paper, the experimental data set was enhanced and processed because the direct use of raw data for training can easily cause overfitting in the model. The CVPR Fine-grained Visual Classification Challenge uses data enhancement operations to amplify raw data. Using this method can enhance the generalization ability of the model and gives the image the characteristic of translation flip invariance. The Albumations data enhancement library was used to randomly enhance the brightness, crop, flip, and shift of the marked images—amplified 2, 4, 3, and 2 times, respectively, for a total of 48 times—and the images were then processed into images with a resolution of 400 × 400, with a total of 7680 experimental data images. To build the Sweetgum Leaf Spot Dataset (SLSD). Figure 2 shows the enhanced results of some data sets.

2.2. Segmentation Method of Diseased Spots on Leaves of Sweetgum

To reduce the number of parameter calculations in the model and improve its calculation speed, the traditional Modified Aligned Xception network used to extract the backbone features of DeepLabV3+ model is replaced by the lightweight Mov-bilenetV2 network in the encoder part. To train a more powerful semantic segmentation model and obtain better segmentation accuracy, this paper adds an SA module [24] to the encoder part and CBAM [25] to the decoder part. To solve the imbalance of SLSD categories, a weighted loss function is introduced to assign different weights to the plaque class and the background class to improve the accuracy of the model’s segmentation of the plaque area.

2.3. Backbone Feature Extraction Network

In the encoder section, we changed the Modified Aligned Xception network used for backbone feature extraction networks in the traditional DeepLabv3+ to a lightweight MobileNetV2 network.

The MobileNet model is a lightweight deep neural network proposed by Google for embedded devices such as mobile phones. MobileNetV2 [26] is an upgraded version of MobileNetV1 [27]. The convolution is usually followed by a ReLU function. In MobileNetV1, ReLU6 is used, and its maximum output is limited to 6. MobileNetV2 changes the final output of ReLU6, which is directly linear. Xception research experiments have proven that the introduction of ReLU activation after deep convolution will be less effective and will also lead to information loss [28].

In the feature extraction operation, the neural network extracts the useful information of the target, which can be embedded in the low dimensional subspace. The traditional network structure is normalized by convolutions containing the ReLU activation function, but using the ReLU activation function in the low-dimensional space will lose more useful information. In the linear bottleneck structure, the ReLU activation function is changed to the linear function to reduce the loss of useful network information.

The reverse residual structure of the MobileNetV2 network application consists of three parts. Firstly, 1 × 1 convolution is used to increase the dimension of input features, 3 × 3 depth separable convolution is used for feature extraction, and then 1 × 1 convolution is used for dimension reduction.

The structural parameters of MobileNetV2 in this experiment are shown in Table 1, where t is the expansion factor; c is the depth of the output feature matrix; n is the number of repeats of Boatneck, which refers to the inverted residual structure; and s is the step distance.

2.4. Convolutional Block Attention Module

Starting from Channel and Spatial scopes, CBAM introduces two analytical dimensions, spatial attention and channel attention, to realize the sequential attention structure from channel to space. The spatial Attention Module (SAM) can make neural networks pay more attention to the pixel areas in the image that determine the segmentation of lesions rather than ignore the irrelevant areas. The Channel Attention Module (CAM) is used to deal with the allocation relationship of feature map channels. At the same time, the allocation of attention to two dimensions enhances the improvement in the performance of the model by the attention mechanism. The structure of CBAM is shown in Figure 3.

The specific process of CAM is as follows: Enter the feature diagram F through Global Max Pooling and Global Average Pooling based on width and height, respectively, to obtain two 1 × 1 × C feature diagrams, and then send them separately. Enter a two-layer neural network (MLP), which is shared. Then, the characteristics of the MLP output are added element-wise and then activated by Sigmoid to generate the final Channel Attention Feature, M_C. Finally, M_C and input feature diagram F are multiplied element-wise to generate the input features required by the Spatial Attention module. The structure diagram of the Attention module on the channel is shown in Figure 4.

The specific calculation is as follows:

M_{c} (F) = σ (W_{1} (W_{0} (F_{avg}^{c})) + W_{1} (W_{0} (F_{\max}^{c}))

(1)

In the formula:

σ

represents the activation function; W₀ ∈ R^C/r×C; W₁ ∈ R^C×C/r. Note that the MLP weights W₀ and W₁ are shared for the two inputs, followed by the ReLU activation function W₀.

SAM’s specific process is as follows: Take the feature diagram F’ output by the Channel Attention module as the input feature diagram of this module. First, make a Channel-based Global Max pooling and Global Average pooling, obtain two H × W × 1 feature diagrams, and then use these two feature diagrams to perform Concat operations based on the Channel. Then, use the 7 × 7 convolution operation to reduce the number of channels. After Sigmoid activation, the Spatial Attention Feature, M_S, is generated. Finally, the input feature of M_S and the module is multiplied to obtain the finally generated features. The structure diagram of the Attention module on the space is shown in Figure 5.

The specific calculation is as follows:

M_{s} = σ (f^{7 \times 7} ([AugPool (F); MaxPool (F)])) σ (f^{7 \times 7} ([F_{avg}^{s}; F_{\max}^{s}]))

(2)

In the formula:

σ

represents the activation function, and f^7×7 represents the convolution operation with a filter size of 7 × 7.

The low-level features obtained in the shallow network are directly used as input information in the decoding stage, introducing a large number of background features and affecting the segmentation results. By adding CBAM, the channel attention mechanism module will provide greater weight, which is highly responsive to the target object; the spatial attention mechanism pays more attention to the foreground area and focuses on the characteristics of the target area, which helps to generate a more effective feature map.

2.5. Shuffle Attention Module

The SA module uses “channel segmentation” to process the subfeatures of each group in parallel. For channel attention branches, GAP is used to generate channel statistics, and then a pair of parameters is used to scale and move the channel vectors. For spatial attention branches, group norms are used to generate spatial statistics, and then a compact feature similar to channel branches is created. Then, the two branches are connected. After that, all subfeatures are aggregated, and, finally, the “channel shuffle” operator is used to realize information communication between different subfeatures. As shown in Figure 6, one branch generates a feature diagram of the channel attention mechanism by obtaining the internal connection of the channel, and the other obtains the internal connection of the space to generate a feature diagram of the spatial attention mechanism.

Channel Attention: For a given feature map X ∈ R^C×H×W, where C, H, and W represent the channel, spatial height, and spatial width, respectively, the SA module first divides X into group G along the channel size, that is, X = [X₁,…, X_G]; moreover, X_k ∈ R^C/G×H×W, where each subfunction X_k gradually capture specific semantic responses during training. Then, the corresponding importance coefficient is generated for each subeigen through the attention module. Specifically, at the beginning of each attention unit, the input of X_k is divided into two branches along the channel dimension, X_k1, X_k2 ∈ R^C/2G×H×W. To weigh speed and accuracy, global information is first embedded by simply using the Global Average Pool to generate channel statistics, s ∈ R^C/2G×1×1, which can be calculated by space size H × W shrinkage X_k1:

s = F_{gp} (X_{k 1}) = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X_{k 1} (i, j)

(3)

Then, the Sigmoid activation function is sued to aggregate information to generate a compact feature map, and the channel pays attention to the final output:

X_{k 1}^{'} = σ (F_{c} (s)) \cdot X_{k 1} = σ (W_{1} s + b_{1}) \cdot X_{k 1}

(4)

In the formula: W₁ ∈ R^C/2G×1×1 and b₁ ∈ R^C/2G×1×1 are parameters for scaling and moving, respectively.

Spatial Attention: Spatial Attention supplements Channel Attention. First, Group Norm (GN) is used on X_k2 to obtain spatial statistics, and then Fc(·) is used to enhance the representation of X_k2. The final output is:

X_{k 2}^{'} = σ (W_{2} \cdot GN (X_{k 2}) + b_{2}) \cdot X_{k 2}

(5)

In the formula: W₂ and b₂ are parameters of R^C/2G×1×1.

Finally, the two branches are connected so that the number of channels is the same as the number of inputs, that is,

X_{k}^{'} = [X_{k 1}^{'} \cdot X_{k 2}^{'}] ϵ R^{C / G \times H \times W}

.

The input image is extracted through the feature of the backbone network. The fusion channel attention mechanism and spatial attention mechanism not only reduce the loss of local information but also captures global information with long-distance dependency, effectively improving the ability of feature extraction.

2.6. Weighted Loss Function

For an input sample, the difference between the output value of the model and the real value of the sample is called a loss. The loss function is a function that describes this difference. For a deep learning model, the neural network weight in the model completes the training through loss reverse propagation. Therefore, the loss function determines the training effect of the deep learning model, which is crucial. In this study, the annotation of SLSD is divided into two categories: disease spot and background, so multi-category cross-entropy is used as the loss function. However, due to the large proportion of background categories in the data set, the network tends to learn the background characteristics during training and the characteristics of the disease spot area cannot be effectively extracted, resulting in the problem of low segmentation accuracy for spot areas. To solve the problem of the low accuracy of spot area segmentation caused by the imbalance in the SLSD category, a weighted loss function is introduced. Based on the original multi-category cross-entropy loss function, the disease spot class and the background class are given different weights. The specific calculation formula is as follows:

L = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{C} W_{j} t_{j}^{i} I_{n} (y_{j}^{i})

(6)

In the formula: N is the total number of pixels, C is the total number of categories, i represents the i training pixels, j represents the category of training pixels,

t_{j}^{i}

is the real disease spot category of the i training pixel annotation,

y_{j}^{i}

is the disease spot category predicted by the ist training pixel,

W_{j} = \frac{N - N_{j}}{N}

represents the weight parameter of category j, and

N_{j}

represents the number of pixels of category j.

2.7. Based on Improved DeeplabV3+ Image Semantic Segmentation Algorithm

The improved model structure is shown in Figure 7.

The algorithm proposed in this article is based on the encoder–decoder structure. The encoder part is a training network, gradually obtaining feature diagrams and capturing higher levels of semantic information. The decoder part projects the semantic features learned by the encoder into the pixel space, and pixel segmentation has been realized. In the encoder, the MobileNetV2 network is used to build a backbone feature extraction network and add a SA module to it, which not only does not affect the speed of the network but also increases the accuracy of the algorithm. Atrous Spatial Pyramid Pooling is connected behind the MobileNetV2 backbone network. Dilated convolution with different sampling rates can be sampled in parallel by ASPP, which is equivalent to capturing the context of images at multiple scales. Dilated convolution adds atrous to the convolution map during the convolution operation to expand the reception field so that each convolution output can contain a larger range of information. The ASPP module outputs high-level features after 1 × 1 convolution. We can visualize the feature diagram of the blue solid line box in the encoder section in Figure 7, as shown in Figure 7B.

In the decoder, the low-level features of CBAM processing are adjusted by 1 × 1 convolution. The features are visualized in the solid blue line box in the decoder section shown in Figure 7A. Then, it is effectively fused with the features obtained by sampling in the encoder 4 times. After 3 × 3 convolution and upsampling, the features are gradually refined, the spatial information is recovered, and, finally, the segmentation result map is obtained.

2.8. Evaluation Indicators

This paper evaluates the segmentation performance for SLSD and selects the Pixel Accuracy (PA), the Mean Intersection over Union (mIou), and the Mean Recall Rate (mRecall) indicator to evaluate the segmentation performance.

PA is the ratio of the correct pixels to the total pixels. The calculation formula is as follows:

PA = \sum_{i = 1}^{k} P_{ii} / \sum_{i = 1}^{k} \sum_{j = 1}^{k} P_{ij}

(7)

mIou is the most commonly used evaluation index in semantic segmentation experimental research. First, calculate the ratio of the real and predicted values of the two sets of real and predicted values to the union, and then calculate the average value of all categories. The calculation formula is as follows:

mIou = \frac{1}{k} \sum_{i = 1}^{k} P_{ii} / \sum_{j = 1}^{k} P_{ij} + \sum_{j = 1}^{k} P_{ji} - P_{ii}

(8)

mRecall is the average value of the ratio of the number of pixels correctly classified in each class to the number of pixels predicted for this category. The calculation formula is as follows:

mRecall = \frac{1}{k} \sum_{i = 1}^{k} P_{ii} / P_{ii} + \sum_{j = 1}^{k} P_{ji}

(9)

where k is the total number of categories, P_ij represents the number of pixels that belong to class i but are predicted as class j, P_ii represents the correct number predicted classes, and P_ji is false positive or false negative.

2.9. Disease Spot Classification

Since there is no definite grading standard for leaf spot degree, to more accurately analyze the grade of leaf spot degree of Sweetgum, this paper formulated the grading standard for the leaf spot of Sweetgum by referring to the Technical Specification for Detection and Reporting of Wheat Scab (GB/T15796-2011) formulated by the People’s Republic of China. Based on the principle of pixel point statistics, this paper uses Python to achieve the statistics of the segmentation of the disease spot area and divides the leaves into four levels: level 1, level 2, level 3, and level 4. The criteria for grading leaf lesions are shown in Table 2:

Among them, k is the proportion of the lesion area in the whole image. The principle calculation formula is as follows:

k = \frac{A_{scab}}{A_{image}} = \frac{\sum (x, y) \in R_{scab} n}{\sum (x, y) \in R_{image} n}

(10)

In the formula:

A_{scab}

is the area of the lesion area,

A_{image}

is the area of the whole image,

R_{scab}

indicates the lesion area, and

R_{image}

indicates the image area.

3. Results

3.1. The Experimental Process

The CPU used in this experimental environment is an Intel i7-10700, the operating system is Windows 10, the GPU is an NVIDIA GeForce RTX2060S, the video memory is 8 G, and the development environment is Pytorch1.9.0 and python 3.8. To improve the accuracy of the model, MobileNetV2 pre-training weight parameters are loaded before model training, the weighted loss function is used as the loss function of the segmentation model, and different weights are assigned to the lesions and background to improve the model’s segmentation accuracy of the lesion area. The training dataset with annotation information was fed into the improved DeepLabV3+ network for training.

For the SLSD data set, 60% of the images are randomly selected as the training set, 20% of the images as the verification set, and 20% of the images as the test set. Because too small or too large learning rates will lead to a slow convergence of the model or even non-convergence, it is necessary to determine the appropriate initial learning rate. In this paper, the accuracy of three models under initial learning rate training is designed and tested, and the results are shown in Figure 8. It can be seen that when the learning rate is 0.0005 and the epoch is 100, the accuracy of the model is the highest.

The training results of the DeepLabV3+ network and the improved DeepLabV3+ network are shown in Figure 9 and Figure 10, in which red represents the DeepLabV3+ network training process and gray represents the improved DeepLabV3+ network training process. It can be seen from the figure that when the number of training iterations exceeds 90, the loss value and accuracy tend to stabilize. The loss value of the two decreases with the increase in the number of iterations, and the accuracy increases with the increase in the number of iterations and gradually stabilizes. However, compared with the traditional DeepLabV3+ network, the improved DeepLabV3+ network converges faster and fluctuates less later on. This is because the attention module effectively combines the spatial channel dimensions, obtains the global dependency between features, captures context feature information, enhances the expression ability of features, strengthens the network learning ability, and eliminates interference information.

3.2. The Segmentation Index Comparison of the Model

To test the segmentation performance and stability of the proposed model for different grades of disease spots, the traditional DeeplabV3+, MobileNetV2+ DeeplabV3+, and Attention Mechanism+ MobileNetV2+ DeeplabV3+ methods discussed in this article are used based on the graded SLSD for comparison. As shown in Table 3 below, V-DV3+ represents the traditional DeeplabV3+ network with MobileNetV2, and AM-V-DV3+ represents the traditional DeeplabV3+ with the MobileNetV2 network and two added attention mechanism modules.

Table 3 shows that all models in the experiment have the highest segmentation accuracy for the level 1 category, which may be caused by the small leaf disease spot area of level 1. Figure 11 shows the comparison of the segmentation accuracy of each model for graded disease spots.

It is obvious from Figure 11 and Table 3 that this model has improved the segmentation accuracy of level 1 categories by 3.7%, 2.5%, and 1.1%, respectively, compared with the DeeplabV3+ models, V-DV3+ models, and AM-V-V3+ models. Compared with the DeeplabV3+, V-DV3+, and AM-V-DV3+ models, the proposed model is improved for the other three types of disease spot grades. The accuracy of the model proposed in this paper is higher than 93% for the segmentation of the four lesions, indicating that the model has good segmentation performance and stability. The model can provide a basis for disease prevention and control in the future and can propose corresponding control schemes according to different degrees of disease spots.

4. Discussion

4.1. Ablation Experiment

The Modified Aligned Xception network originally used for backbone feature extraction has been changed to the MobileNetV2 network. To verify the validity of adding attention mechanism modules to the backbone feature extraction network part and decoder part, four different groups of different parties have been set up. The case carries out experiments. The experimental results are shown in Table 4.

Scheme 1: Based on the traditional DeeplabV3+ network structure, the backbone feature network is replaced with the MobileNetV2 network;

Scheme 2: Based on Scheme 1, the SA module is added to the backbone feature extraction network;

Scheme 3: Based on Scheme 1, CBAM is added to the decoder;

Scheme 4: Based on Scheme 1, the attention mechanism module is added to the decoder of the backbone feature extraction network.

From the results of the ablation experiment, it can be seen that compared with the traditional DeeplabV3+ model, the model parameters and training time of Scheme 1 is significantly reduced. Experiments show that replacing the backbone feature extraction network is correct. Scheme 2 and Scheme 3 have improved PA, mRecall, and mIou values compared with the traditional DeeplabV3+. For Scheme 4, the increase is larger, indicating that the introduction of attention mechanisms in the encoder and decoder improves accuracy more obviously.

To verify the effectiveness of the weighted loss function in improving the segmentation accuracy of the disease spot region, a detailed comparison experiment is carried out comparing the segmentation accuracy of Scheme 4 and our method. The experimental results are shown in Table 5. From the comparison results in Table 5, it can be seen that this method is relatively improved in the Recall of disease spots compared with Scheme 4, but it has decreased slightly in the Recall of the background and improved in the spot and Iou in the background. Overall, the PA, mRecall, and mIou of this method have improved compared with Scheme 4. Experimental data show that the introduction of a weighted function can improve the segmentation accuracy of the disease spot region of the model, thus improving the overall segmentation accuracy of the model.

4.2. Comparison of The Performance of Different Segmentation Methods

To further verify and improve the segmentation performance of the DeeplabV3+ model, this method is compared with the traditional semantic segmentation models commonly used in plant diseases, such as DeeplabV3+, Unet [29], and Segnet [30]. The comparison results are shown in Table 6.

From the experimental results in Table 6. Our model outperforms other segmentation networks in every aspect. It can be seen that the PA of the algorithm proposed in this paper is 94.5%, which is 4.2%, 6.2%, and 4.8% higher than the traditional DeeplabV3+, Unet, and Segnet models, respectively. The mRecall proposed in this paper is 85.4%, which is 4.3%, 6.7%, and 5.1% higher compared to the traditional DeeplabV3+, Unet, and Segnet. The mIou proposed in this paper is 81.3%, which is higher than the traditional DeeplabV3+, Unet, and Segnet by 2.4%, 4.9%, and 1.9%, respectively. The experimental data show that the introduction of different dual-channel attention mechanisms and weighted loss functions in this paper has changed the performance of features and enhanced their expression ability.

Compared with the traditional DeeplabV3+ and Unet, the proposed algorithm’s training time is significantly faster. This is because, in the neural network, the greater the number layers of the model, the more parameters there are, the more complex the model is, and the more difficult the training is. However, compared with Segnet, the training time is slow because the structure of Segnet itself is relatively simple, but the segmentation accuracy is not as good as the algorithm proposed in this paper. To sum up, the algorithm proposed in this paper replaces the backbone feature extraction network with MobileNetV2, which reduces the amount of parameter calculation in the model, thus improving the calculation speed.

This paper also visually displays the segmentation effect of four algorithms, as shown in Figure 12, where red represents the lesions and black represents the background. Compared with Unet and Segnet, the traditional DeeplabV3+ and the algorithm proposed in this paper have a higher recognition rate for the edge of the disease spot area. Because DeeplabV3+ contains Atrous Convolutions of different rates in the encoding part for feature extraction, it can enhance the recognition ability of objects of different sizes and can recover the feature information of the region edge. Compared with the traditional DeeplabV3+, the algorithm in this paper has improved missed detections of disease spots and the edge recognition of the disease spot area. Compared with Unet and Segnet, the segmentation effect of the disease spot is better, which proves that the addition of different two-channel attention mechanisms can strengthen the extraction ability of the information features of the edges and area of disease spots. The weighted loss function can reduce the loss of feature information.

5. Conclusions

In this paper, an improved method of spot segmentation based on the DeeplabV3+ network is proposed. The attention mechanism module is introduced into the trunk feature extraction network and the decoder to improve the feature extraction ability of the model. The recognition the edges of the spot region is better, and the segmentation accuracy is higher. To solve the SLSD category imbalance problem, a weighted loss function is introduced to improve the segmentation accuracy of the model. At the same time, the Modified Aligned Xception network used for trunk feature extraction was replaced with the MobileNetV2 network to reduce the amount of parameter calculation in the model and improve its calculation speed. Experiments were carried out on the SLSD. The experimental results show that the method can effectively extract the diseased spots of sweetgum leaves and achieve more accurate and efficient segmentation of diseased spots. However, the segmentation effect of other plant leaf spot images is unknown. In the next step, pre-training will be carried out on other plant leaf spot data sets to realize the segmentation and recognition of different disease spots in the real environment. Our team will combine the treatment methods of different lesions with the improved model.

Author Contributions

Conceptualization, Funding acquisition, Methodology, X.Y.; Data curation, Formal analysis, Investigation, Software, Visualization, Writing—original draft, M.C.; Supervision, Validation, G.W.; Project administration, L.M.; Resources, P.W.; Writing—review and editing, C.M., K.E.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, Y.; Wang, Z.; Qi, S.; Wang, X.; Zhao, J.; Zhang, J.; Li, B.; Zhang, Y.; Liu, X.; Yuan, W. In Vitro Tetraploid Induction from Leaf and Petiole Explants of Hybrid Sweetgum (Liquidambar styraciflua × Liquidambar formosana). Forests 2017, 8, 264. [Google Scholar] [CrossRef] [Green Version]
Li, Y.; Wan, Y.; Lin, W.; Ernstsons, A.; Gao, L. Estimating Potential Distribution of Sweetgum Pest Acanthotomicus suncei and Potential Economic Losses in Nursery Stock and Urban Areas in China. Insects 2021, 12, 155. [Google Scholar] [CrossRef] [PubMed]
Nandini, K.; Singh Pratik, P.K. Detection of Disease in Plant Leaf using Image Segmentation. Int. J. Comput. Appl. 2019, 35, 29–32. [Google Scholar]
Kumar, C.S.; Mathew, D.; Cherian, K.A. Foliar fungal disease classification in banana plants using an elliptical local binary pattern on multiresolution dual tree complex wavelet transform domain. Inf. Process. Agric. 2021, 8, 581–592. [Google Scholar]
Jothiaruna, N.; Sundar, K.J.A.; Karthikeyan, B. A segmentation method for disease spot images incorporating chrominance in Comprehensive Color Feature and Region Growing. Comput. Electron. Agric. 2019, 165, 104934. [Google Scholar] [CrossRef]
Pavan, K.V.; Rao, E.G.; Anitha, G. Plant Disease Detection using Convolutional Neural Networks. In Proceedings of the 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI), Gwalior, India, 18–20 November 2021. [Google Scholar]
Bedi, P.; Gole, P. Plant disease detection using hybrid model based on convolutional autoencoder and convolutional neural network. Artif. Intell. Agric. 2021, 5, 90–101. [Google Scholar] [CrossRef]
Khan, K.; Khan, R.U.; Albattah, W.; Qamar, A.M. End-to-End Semantic Leaf Segmentation Framework for Plants Disease Classification. Complexity 2022, 2022, 1168700. [Google Scholar] [CrossRef]
Waheed, H.; Zafar, N.; Akram, W.; Manzoor, A.; Gani, A.; Islam, S.U. Deep Learning Based Disease, Pest Pattern and Nutritional Deficiency Detection System for “Zingiberaceae” Crop. Agriculture 2022, 12, 742. [Google Scholar] [CrossRef]
Kavitha Lakshmi, R.; Savarimuthu, N. DPD-DS for plant disease detection based on instance segmentation. J. Ambient. Intell. Humaniz. Comput. 2021, 1–11. [Google Scholar] [CrossRef]
Mobeen, A.; Muhammad, A.; Hyeonjoon, M.; Dongil, H. Plant Disease Detection in Imbalanced Datasets Using Efficient Convolutional Neural Networks with Stepwise Transfer Learning. IEEE Access 2021, 9, 140565–140580. [Google Scholar]
Xinda, L.; Weiqing, M.; Shuhuan, M.; Lili, W.; Shuqiang, J. Plant Disease Recognition: A Large-Scale Benchmark Dataset and a Visual Region and Loss Reweighting Approach. IEEE Trans. Image Process. 2021, 30, 2003–2015. [Google Scholar]
Kaur, P.; Harnal, S.; Gautam, V.; Singh, M.P.; Singh, S.P. A novel transfer deep learning method for detection and classification of plant leaf disease. J. Ambient Intell. Humaniz. Comput. 2022, 1–18. [Google Scholar] [CrossRef]
Chau, D.H.; Tran, D.C.; Vo, H.N.; Do, T.T.; Nguyen, T.H.; Nguyen, B.Q.; Debnath, N.C.; Nguyen, V.D. Plant Leaf Diseases Detection and Identification Using Deep Learning Model. In Proceedings of the 8th International Conference on Advanced Machine Learning and Technologies and Applications (AMLTA2022), AMLTA 2022, Cairo, Egypt, 5–7 May 2022; Hassanien, A.E., Rizk, R.Y., Snášel, V., Abdel-Kader, R.F., Eds.; Lecture Notes on Data Engineering and Communications Technologies. Springer: Cham, Switzerland, 2022; Volume 113, pp. 3–10. [Google Scholar] [CrossRef]
Yuan, H.; Zhu, J.; Wang, Q.; Cheng, M.; Cai, Z. An Improved DeepLab v3+ Deep Learning Network Applied to the Segmentation of Grape Leaf Black Rot Spots. Front. Plant Sci. 2022, 13, 795410. [Google Scholar] [CrossRef] [PubMed]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [Green Version]
Liang, Q.; Xiang, S.; Hu, Y.; Coppola, G.; Zhang, D.; Sun, W. PD 2 SE-Net: Computer-assisted plant disease diagnosis and severity estimation network. Comput. Electron. Agric. 2019, 157, 518–529. [Google Scholar] [CrossRef]
Yu, L.; Zeng, Z.; Liu, A.; Xie, X.; Wang, H.; Xu, F.; Hong, W. A Lightweight Complex-Valued DeepLabv3+ for Semantic Segmentation of PolSAR Image. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 930–943. [Google Scholar] [CrossRef]
Ono, H.; Murakami, S.; Kamiya, T.; Aoki, T. Automatic Segmentation of Finger Bone Regions from CR Images Using Improved DeepLabv3+. In Proceedings of the 2021 21st International Conference on Control, Automation and Automation and Systems (ICCAS), Jeju, Korea, 12–15 October 2021; pp. 1788–1791. [Google Scholar] [CrossRef]
Peng, H.; Xue, C.; Shao, Y.; Chen, K.; Xiong, J.; Xie, Z.; Zhang, L. Semantic Segmentation of Litchi Branches Using DeepLabV3+ Model. IEEE Access 2020, 8, 164546–164555. [Google Scholar] [CrossRef]
Das, S.; Fime, A.A.; Siddique, N.; Hashem, M.M.A. Estimation of Road Boundary for Intelligent Vehicles Based on DeepLabV3+ Architecture. IEEE Access 2021, 9, 121060–121075. [Google Scholar] [CrossRef]
Hu, Z.; Zhao, J.; Luo, Y.; Ou, J. Semantic SLAM Based on Improved DeepLabv3⁺ in Dynamic Scenarios. IEEE Access 2022, 10, 21160–21168. [Google Scholar] [CrossRef]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2018. [Google Scholar]
Yang, Y.B. SA-Net: Shuffle Attention for Deep Convolutional Neural Networks. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2018. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Deni, S.; Oktay, Y. LEMO XINET: Lite ensemble MobileNetV2 and Xception models to predict plant disease. Ecol. Inform. 2022, 70, 101698. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer International Publishing: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmen-tation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Image marking results.

Figure 2. Data Enhancement.

Figure 3. CBAM Structure Diagram.

Figure 4. CAM Structure Diagram.

Figure 5. SAM Structure Diagram.

Figure 6. SA Overall Architecture Diagram.

Figure 7. Structure diagram of improved DeepLabV3+ model. (A) and (B) are feature maps.

Figure 8. Relationship between model accuracy, learning rate, and iteration times.

Figure 9. Training loss value change curve.

Figure 10. Training accuracy change curve.

Figure 11. Comparison of segmentation Accuracy of graded disease spots by various Models.

Figure 12. Comparison diagram of the segmentation effect.

Table 1. Mobilenetv2 structure parameter table.

Input	Operator	t	c	n	s
224² × 3	Conv2d	—	32	1	2
112² × 32	Bottleneck	1	16	1	1
112² × 16	Bottleneck	6	24	2	2
56² × 24	Bottleneck	6	32	3	2
28² × 32	Bottleneck	6	64	4	2
28² × 64	Bottleneck	6	96	3	1
14² × 96	Bottleneck	6	160	3	2
7² × 160	Bottleneck	6	320	1	1
7² × 320	Conv2d1 × 1	—	1280	1	1
7² × 1280	Avgpool7 × 7	—	—	1	—
1² × 1 × k	Conv2d1 × 1	—	k	—	—

Table 2. Grading table of Disease Spots on Maple Leaves.

Disease Spot Level	The Range of k	Quantity
level 1	0 ≤ k ≤ 2%	2115
level 2	2 < k ≤ 4%	1973
level 3	4 < k ≤ 6%	1897
level 4	6% < k	1695

Table 3. The results of comparative experiments of Accuracy of different Models.

Model	PA (%)
Model	Level 1	Level 2	Level 3	Level 4
Ours	97.3	96.8	96.5	93.7
DeeplabV3+	93.6	93.4	90.2	84.1
V-DV3+	94.8	94.5	92.3	85.7
AM-V-DV3+	96.2	95.8	94.1	91.5

Table 4. Ablation experiment results.

Scheme	PA (%)	mRecall (%)	mIou (%)	Number of Parameters (MB)	Training Time (h)
DeeplabV3+	90.3	81.1	78.9	209	7.45
scheme 1	90.8	81.6	79.1	24.2	4.80
scheme 2	92.3	83.5	79.3	26.3	4.93
scheme 3	91.6	82.3	79.5	26.7	5.02
scheme 4	93.5	84.7	80.4	27.2	5.05

Table 5. Detailed comparison results of segmentation accuracy.

Scheme	PA (%)	Recall (%)			Iou (%)
Scheme	PA (%)	Disease Spot	Background	mRecall (%)	Disease Spot	Background	mIou (%)
scheme 4	93.5	82.4	86.9	84.7	76.7	84.1	80.4
Ours	94.5	84.6	86.2	85.4	77.3	85.3	81.3

Table 6. Comparison of evaluation indexes of different segmentation models.

Model	PA (%)	mRecall (%)	mIou (%)	Training Time (h)	Occupy Memory (GB)
DeeplabV3+	90.3	81.1	78.9	7.45	8.7
Unet	88.3	78.7	76.4	6.30	8.2
Segnet	89.7	80.3	79.4	4.22	9.5
Ours	94.5	85.4	81.3	5.05	7.5

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cai, M.; Yi, X.; Wang, G.; Mo, L.; Wu, P.; Mwanza, C.; Kapula, K.E. Image Segmentation Method for Sweetgum Leaf Spots Based on an Improved DeeplabV3+ Network. Forests 2022, 13, 2095. https://doi.org/10.3390/f13122095

AMA Style

Cai M, Yi X, Wang G, Mo L, Wu P, Mwanza C, Kapula KE. Image Segmentation Method for Sweetgum Leaf Spots Based on an Improved DeeplabV3+ Network. Forests. 2022; 13(12):2095. https://doi.org/10.3390/f13122095

Chicago/Turabian Style

Cai, Maodong, Xiaomei Yi, Guoying Wang, Lufeng Mo, Peng Wu, Christine Mwanza, and Kasanda Ernest Kapula. 2022. "Image Segmentation Method for Sweetgum Leaf Spots Based on an Improved DeeplabV3+ Network" Forests 13, no. 12: 2095. https://doi.org/10.3390/f13122095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Image Segmentation Method for Sweetgum Leaf Spots Based on an Improved DeeplabV3+ Network

Abstract

1. Introduction

2. Materials and Methods

2.1. Production of the Dataset

2.2. Segmentation Method of Diseased Spots on Leaves of Sweetgum

2.3. Backbone Feature Extraction Network

2.4. Convolutional Block Attention Module

2.5. Shuffle Attention Module

2.6. Weighted Loss Function

2.7. Based on Improved DeeplabV3+ Image Semantic Segmentation Algorithm

2.8. Evaluation Indicators

2.9. Disease Spot Classification

3. Results

3.1. The Experimental Process

3.2. The Segmentation Index Comparison of the Model

4. Discussion

4.1. Ablation Experiment

4.2. Comparison of The Performance of Different Segmentation Methods

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI