1. Introduction
Potato late blight disease, caused by
Phytophthora infestans (Mont.) de Bary, is one of the most destructive potato diseases, resulting in significant potato yield loss across the major potato growing areas worldwide [
1,
2]. The yield loss due to the infestation of late blight disease is around
to
[
3,
4]. The current control measure mainly relies on the application of fungicides [
5], which is expensive and has negative impacts on the environment and human health due to excessive use of pesticides. Therefore, the early, accurate detection of potato late blight disease is vital for effective disease control and management with minimal application of fungicides.
Since late blight disease affects potato leaves, stems and tubers with visible symptoms (e.g., black lesions with granular regions and green halo) [
6,
7], the current detection of late blight disease in practice is mainly based on visual observation [
8,
9]. However, this manual inspection method is time consuming and costly and often causes a delay in late blight disease management, especially at an early stage, across large fields [
10]. In addition, field surveyors diagnose diseases based on their domain knowledge, which may introduce inconsistency and bias due to individual subjectivity [
11]. An automated approach for fast and reliable potato late blight disease diagnosis is important to ensure effective disease management and control.
With the advancements in low-cost sensor technology, computer vision and remote sensing, machine vision technology based on images (such as red, green and blue (RGB) images, thermal images, multispectral and hyperspectral images) has been successfully used in agricultural and engineering fields [
12,
13,
14,
15,
16,
17,
18,
19,
20,
21]. For example, Wu et al. [
20] developed a deep learning-based model to detect the edge images of flower buds and inflorescence axes and successfully applied this algorithm to the banana bud-cutting robot for real-time operation. Cao et al. [
21] developed a multi-objective particle swarm optimizer for a multi-objective trajectory model of the manipulator, which has improved the stability of the fruit picking manipulator and facilitated nondestructive picking. Particularly, in the area of automated crop disease diagnosis [
22,
23], Unmanned Aerial Vehicles (UAVs) equipped with RGB cameras and thermal sensors have been used for plant physiological monitoring (e.g., transpiration, leaf water, etc.) [
13]. Li et al. [
24] acquired the potato biomass-associated spatial and spectral features from the UAV-based RGB and hyperspectral imagery, respectively, and then they fed them into a random forest (RF) model to predict the potato yield. Wan et al. [
25] fused the spectral and structural information from multispectral imagery into a multi-temporal vegetation index model to predict the rice grain yield.
In addition, with the advancements in remote sensing technologies, remote sensing-based vision technology has shown great potential for agricultural control and management, especially for automatic crop disease diagnosis [
22,
23]. The existing remote sensing-based computer vision models were developed based on the characteristics of the images (such as the red, green and blue (RGB) images, thermal images, multispectral and hyperspectral images) [
12,
13,
14,
15,
16]. For instance, Unmanned Aerial Vehicles (UAVs) equipped with RGB cameras and thermal sensors have been used for plant physiological monitoring (e.g., transpiration, leaf water, etc.) [
13]. Li et al. [
24] acquired potato biomass-associated spatial and spectral features from the UAV-based RGB and hyperspectral imagery, respectively, and then they fed them into a random forest (RF) model to predict the potato yield. Wan et al. [
25] fused the spectral and structural information from multispectral imagery into a multi-temporal vegetation index model to predict the rice grain yield.
Benefiting from many more narrow spectral bands over a contiguous spectral range, hyperspectral imagery (HSI) provides spatial information in two dimensions and rich spectral information in the third dimension, capturing detailed spectral–spatial information of the disease infestation and offering the potential to provide better diagnostic accuracy [
26,
27]. However, extracting effective infestation features from the abundant spectral and spatial information from hyperspectral images is a key challenge for disease diagnosis. Currently, based on the features used in HSI-based disease detection, the existing models can be divided into three categories: spectral feature-based approaches focusing on spectral signatures composed of the associated radiation signal of each pixel of ab image scene in various spectral ranges [
28,
29,
30]; spatial feature-based approaches focusing on features such as shape, texture and geometrical structures [
31,
32,
33,
34]; and the joint spectral–spatial feature-based approaches focusing on a combination of spectral and spatial features [
35,
36,
37,
38,
39,
40,
41,
42]. A detailed discussion of these methods can be found in
Section 2.
Despite the fact that existing works are encouraging, the existing models do not consider the hierarchical structure of the spectral and spatial information of the crop diseases (for instance, canopy structural information and reflectance radiation variance of the ground objects hidden in HSI data), which comprises important indicators for crop disease diagnosis. In fact, changes in reflectance due to plant pathogens and plant diseases are highly disease-specific since the optical properties of plant diseases are related to a number of factors such as foliar pathogens, canopy structural information, pigment content, etc.
Therefore, to address the issue presented above, the hierarchical structure of the spectral–spatial features should be considered in the learning process. In this paper, we propose a novel CropdocNet for the automated detection and discrimination of potato late blight disease. The contributions of the proposed work include the following:
The development of an end-to-end deep learning framework (CropdocNet) for potato disease detection.
The proposed introduction of multiple capsule layers to handle the hierarchical structure of the spectral–spatial features extracted from HSIs.
Combination of the spectral–spatial features to represent the part-to-whole relationship between the deep features and the target classes (i.e., healthy potato and the potato infested with late blight disease).
The remainder of this paper is organized as follows:
Section 2 describes the related work;
Section 3 describes the study area, data collection, and the proposed model;
Section 4 presents the experimental results;
Section 5 provides discussions; and
Section 6 summarizes this work and highlights future works.
2. Related Work in Crop Disease Detection Based on Hyperspectral Imagery
In this section, we mainly discuss related work in crop disease detection based on hyperspectral imagery (HSI). Based on features used for HSI-based crop disease detection, there are broadly three main categories: spectral feature-based approaches, spatial feature-based approaches and joint spectral–spatial feature-based approaches.
Table 1 summarizes the existing models on potato late blight disease detection based on different features used in the machine learning process, which provides a baseline for hyperspectral imagery-based late blight disease detection. The detailed reviews of each class are described below.
The category of spectral feature-based approaches exploits the spectral features associated with plant diseases, which represent the biophysical and biochemical status of the plant leaves from the spectral domain of HSI [
28,
29,
30]. For example, Nagasubramanian et al. [
43] found that the spectral bands associated with the depth of chlorophyll absorption are very sensitive to the occurrence of plant diseases, and they extracted the optimal spectral bands as the input of the Genetic Algorithm (GA)-based SVM for the early identification of charcoal rot disease in soybean, with a
classification accuracy. Huang et al. [
44] extracted 12 sensitive spectral features for Fusarium head blight, which were then fed into a SVM model to diagnose the severity of Fusarium head blight with good performance.
The category of spatial feature-based approaches exploits the spatial texture of the hyperspectral image, which represents the foliar contextual variances, such as the color, density and leaf angle, and is one of the important factors for crop disease diagnosis [
31,
32,
33,
34]. For example, Mahlein et al. [
45] summarized the spatial features of the RGB, multi-spectral, and hyperspectral images used in the automatic detection of disease detection. Their study showed that the spatial properties of the crop leaves were affected by leaf chemical parameters (e.g., pigments, water, sugars, etc.) and light reflected from internal leaf structures. For instance, the spatial texture of the hyperspectral bands from 400 to 700 nm is mainly influenced by foliar content, and the spatial texture of the bands from 700 to 1100 nm reflects the leaf structure and internal scattering processes. Yuan et al. [
46] introduced the spatial texture of the satellite data into the spatial angle mapper (SAM) to monitor wheat powdery mildew at the regional level.
In the category of joint spectral–spatial feature-based approaches, there are two main strategies for extracting joint spectral–spatial features to represent the characteristics of crop diseases in HSI data. The first strategy is to extract spatial and spectral features separately and then combine them together based on 1D or 2D approaches (e.g., feature stacking, convolutional filters, etc.) [
40,
41,
42]. For example, Xie et al. [
47] investigated the spectral and spatial features extracted from hyperspectral imagery to detect early blight disease on eggplant leaves, and they then stacked these features as the input of an AdaBoost model to detect healthy and infected samples. The second strategy is to jointly extract the correlated spectral–spatial information of the HSI cube through 3D kernel-based approaches [
48,
49,
50]. For instance, Nguyen et al. [
51] tested the performance of the 2D convolutional neural network (2D-CNN) and 3D convolutional neural network (3D-CNN) for the early detection of grapevine viral diseases. Their findings demonstrated that the 3D convolutional filter was able to produce promising results compared with the 2D convolutional filter from hyperspectral cubes. Benefiting from the advanced self-learning performance of the 3D convolutional kernel, the depth of the 3D convolutional kernel has also been investigated for crop disease diagnosis [
35,
36,
37,
38,
39]. For instance, Suryawati et al. [
52] compared the CNN baselines with the depths of 2, 5 and 13 3D convolutional layers, and their findings suggested that the deeper architecture achieved higher accuracy for plant disease detection tasks. Nagasubramanian et al. [
53] developed a 3D deep convolutional neural network (DCNN) with eight 3D convolutional layers to extract the deep spectral–spatial features to represent the inoculated stem images from the soybean crops. Kumar et al. [
54] proposed a 3D convolutional neural network (CNN) with six 3D convolutional layers to extract the spectral–spatial features for various crop diseases.
However, these existing methods fail to model the various kinds of reflectance radiation of the crop disease and the hierarchical structure of the disease-specific features, which are affected by the particular combination of multiple factors, such as the foliar biophysical variations, the appearance of typical fungal structures and canopy structural information, from region to region [
27]. A reason behind this is that the convolutional kernels in the existing CNN methods are independent of each other, making it hard to model the part-to-whole relationship of the spatial–spatial features and to characterize the complexity and diversity of potato late blight disease on HSI data [
36]. Therefore, this study proposes a novel end-to-end deep learning model to address the limitations under consideration of the hierarchical structure of the spectral–spatial features associated with plant diseases.
5. Discussion
The hierarchical structure of the spectral–spatial information extracted from HSI data has been proven to be an effective way to represent the invariance of the target entities on HSI [
36]. In this paper, we propose the CropdocNet model to learn the late blight disease-associated hierarchical structure information from the UAV HSI data, providing more accurate crop disease diagnosis at the farm scale. Unlike the traditional scalar features used in the existing machine learning/deep learning approaches, our proposed method introduces the capsule layers to learn the hierarchical structure of the late blight disease-associated spectral–spatial characteristics, which allows the capture of the rotation invariance of the late blight disease under complicated field conditions, leading to improvements in terms of the model’s accuracy, robustness and generalizability.
To trade off between the accuracy and computing efficiency, the effects of the depth of the convolutional filters are investigated. Our findings suggest that there is no obvious improvement in accuracy when the depth of 1D convolutional kernels and the depth of 3D convolutional kernels . We also find that, by using the multi-scale capsule units (), the model’s performance on HSI-based potato late blight disease detection could be improved.
To investigate the effectiveness of using the hierarchical vector features for accurate disease detection, we have compared the proposed model with three typical machine learning models considering only the spectral or spatial scalar features. The results illustrate that the proposed model outperforms the traditional models in terms of overall accuracy, average accuracy, sensitivity and specificity on both the training dataset (collected under controlled field conditions) and the independent testing dataset (collected under natural conditions). In addition, the classification differences between the proposed model and the existing models are statistically significance based on the McNemar’s Chi-squared test.
5.1. The Assessment of the Hierarchical Vector Feature
To further visually demonstrate the benefit of using hierarchical vector features in the proposed CropdocNet model, we have compared the visualized feature space and the mapping results of the healthy (see the first row of
Figure 7) and diseased plots (see the second row of
Figure 7) from three models: SVM, 3D-CNN and the proposed CropdocNet model. Our quantitative assessment reveals that the accuracy of the potato late blight disease plots is
,
and
for SVM, 3D-CNN and CropdocNet, respectively. Specifically, for the SVM-based model, which only maps the spectral information into the feature space, a total of
of the areas in the healthy plots are misclassified as potato late blight disease (see the left subgraph of
Figure 7b), and the feature space of the samples in the yellow frame, as shown in the right subgraph of
Figure 7b, explains the reason for these misclassifications. Thus, no cluster characteristics can be observed between the spectral features in the SVM-based feature space, indicating that the inter-class spectral variances are not significant in the SVM decision hyperplane.
In contrast, the spectral–spatial information based on 3D-CNN (
Figure 7c) performs better than the SVM-based model. However, looking at the edge of the plots, there are obvious misclassifications. The right subgraph of
Figure 7c reveals the averages and the standard deviations of the activated high-level features of the samples within the yellow frame. It is worth noting that, for the healthy potato (the first row of
Figure 7c), the average values of the activated joint spectral–spatial features for different classes are quite close, and the standard deviations are relatively high, illustrating that the inter-class distances between the healthy potato and potato late blight disease are not significant in the features space. Similar results can be found in the late blight disease (see second row of
Figure 7c). Thus, no significant inter-class separability can be represented in the joint spectral–spatial feature space owning to the mixed spectral–spatial signatures of plants and the background.
In comparison, the hierarchical vector features-based CropdocNet model provides more accurate classification because the hierarchical structural capsule features can express the various spectral-spatial characteristics of the target entities. For example, the white panels in the diseased plot (see the second row of of
Figure 7d) are successfully classified as the background. The right subgraphs of
Figure 7d demonstrate the average, direction and standard deviations of the activated hierarchical capsule features of the samples within the yellow frame. It is noteworthy that the average length and direction of the activated features for different classes are quite different, and the standard deviations (see the shadow under the arrows) do not overlap with each other. These results fully demonstrate the significant clustering of each class in the hierarchical capsule feature space; thus, the hierarchical vector features are capable of capturing most of the spectral–spatial variability found in practice.
5.2. The General Comparison of CropdocNet and the Existing Models
For an indirect comparison between the proposed CropdocNet model and the existing case studies, we have drawn
Table 6 and present the accuracy and computing efficiency. As shown in
Table 6, our proposed CropdocNet model has the best accuracy (
) compared to the existing works. For computing efficiency, due to the deep-layered network architecture and large scale samples, the deep learning models (3D-CNN and CropdocNet) require more computing time compared to traditional machine learning methods (such as SVM, RF) which only use fewer samples.
5.3. Limitations and Future Works
Benefiting from the hierarchical capsule features, the proposed CropdocNet model performs better for potato late blight disease detection than the existing spectral-based or spectral–spatial based deep/machine learning models, and the generalizability of the network architecture is better than the existing models. The previous experimental evaluation has demonstrated the robustness and generalizability of our proposed model. Our model can be adapted to the detection of other crop diseases since our proposed method introduces the capsule layers to learn the hierarchical structure of the disease-associated spectral–spatial characteristics, which allows for the capture of the rotation invariance of diseases under complicated conditions. However, it is worth mentioning that our current input data for model training are mainly based on the full bloom period of potato growth, when the canopy closure reaches maximum and the field microclimate is mostly suitable for the occurrence of late blight disease; thus, the direct use of the pre-trained model may lead to limited performance. The reason is that the hyperspectral imagery is generally influenced by the mixed pixel effect, which depends on the crop growth and stress types. Therefore, in future studies, we will validate the proposed model on more UAV-based HSI data with various potato growth stages and various diseases. Specifically, we will further test the receptive field of CropdocNet and fine-tune the model on HSI data for performance enhancement under various field conditions.