1. Introduction
The most significant challenges that any crop faces are diseases [
1], pests [
2], weeds [
3], and nutritional deficiencies [
4]. For instance, a 20–40% loss occurs due to plant diseases [
5] and plant pests [
6] annually. Similarly, nutritional deficiency also influences the productivity of agricultural foods [
7]. Farmers and domain experts used manual methods for detecting disorders by visualizing the plant’s leaf with the naked eye. However, this method became infeasible due to the large size of fields, physical conditions, time, and cost [
5]. Therefore, automatic, robust, precise, fast, and cost-effective methods and techniques for plant disorder identification have been demanding research in smart agriculture in recent years.
Deep learning has made considerable progress in image-based classification problems [
8,
9]. A key benefit of deep learning is that it reduces the effort required for feature extraction, which is time-consuming and requires expertise. In this context, convolutional neural networks (CNN) have gained a lot of success in image classification and object recognition. Deep CNN, an extended version of CNN, has been used in detection, classification, and recognition problems. However, training these models requires considerable training data and computing resources.
In the literature, a rise in the use of deep learning-based methods can be noticed in identifying the different diseases associated with crops such as wheat, tomato, cucumber, apple, rice, pearl millet, citrus plants, and cassava. For instance, a [
10] study worked on identifying cassava plant disease. In this work, the CNN model was employed on 720 images of the dataset and classified the seven cassava plant diseases and one healthy class. However, they gained a lower classification rate when tested on real-time image classification. The authors in [
2] proposed a deep-learning framework for pest and disease identification problems. They employed a CNN model to diagnose 27 plant diseases. The authors conducted a series of tests that revealed an overall detection accuracy of 86.1%. A DL-based CNN model is trained using 2029 images and detects the five apple diseases on leaves. As they trained on a small dataset, this reported a classification accuracy of 78.8%. Using an open source dataset, PlantVillage, the authors in [
11] constructed deep CNN models for plant leaf disease detection. This dataset contains 54,306 images showing a total of 26 diseases collected from different plants in a lab environment. The authors [
12] worked on numerous disease detections by considering 12 crops using real-field images.
Farmers are also confronted with pest attacks despite plant diseases. Pests are environmental disasters that prevent plants from growing normally or even killing them. When pests attack crops, they leave certain patterns on the leaves. These pest patterns on leaf detection are a very challenging process. Therefore, deep learning is introduced to detect the pest patterns on the leaves of crops [
13]. The classification of plant pests using automatic deep learning-based methods have been performed in various studies. The authors in [
13] detected the pests of the strawberry plant, which is sown in a greenhouse environment. In this work, a classical machine learning-based support vector machine algorithm is applied to detect the housefly and whitefly pests of the strawberry plant. The authors [
14] proposed coffee tree disease identification using different classical and deep learning methods. In this study, InceptionV3 proved challenging for testing the dataset [
15]. The study combined saliency techniques and CNNs to build an insect detection system. The system is 92.43% accurate on small datasets and 61.93% accurate on large datasets.
A deficiency will emerge when a plant lacks a necessary nutrient for growth and will indicate various signs of defects. Hence, detecting nutrient deficiencies is critical for early diagnosis to avoid severe losses. Deep learning frameworks show their performance in nutrient deficiency recognition [
7]. In recent studies, work on nutrient deficiency classification problems can also be seen. The goal of the work [
7] was to provide a thorough review of the methods utilized to identify plant nutrient deficiencies using digital images. The authors in [
16] detected nitrogen deficiencies in one variety of rice plant, capturing an image of 5 megapixels and gaining 0.92% accuracy. The authors in [
17] detected seven nutrient deficient types with the ResNet-50 model, having 4,088 images of black gram and showing an accuracy of 65.44%. By combining inception-ResNet and the autoencoder, the system accurately identified the three nutritional deficiencies for 571 tomato plant images and obtained a 91% test accuracy. However, the dataset they employed is limited in scope, only covering N, Ca, and K nutrients [
18].
The research community has presented extensive work on plant defect identification and recognition problems in recent decades. However, there is still a need for work on ginger plant defect identification, recognition, and classification. To cover the existing issues of the ginger plant, we used various deep-learning models to classify ginger plant-associated disorders such as pest pattern, nutrient deficiency, and soft rot disease. This work is an extension of our previous work on the ginger plant [
19]. In the former study, we proposed identifying and classifying ginger plant soft rot disease, nutritional deficiencies, and pest patterns at early as well as multiple stages using different deep learning-based models such as CNN, VGG-16, MobileNetV2, and ANN. The proposed deep learning models were trained and tested on the dataset of ginger plant leaf images consisting of healthy plants, pest patterns, nutrient deficiency, and soft rot disease acquired from an entire field of a standing crop. The study analyzed the performance and capability of the deep learning methods for ginger plant disorder detection. Here, the previous work is extended by considering the identification and classification of ginger plant disorders in real time. We developed an automatic, android-based detection system that takes the leaf images of the crop in the field as input and provides real-time identification results. Moreover, the system also provides a recommendation to the end user based on the detected results. In addition, this study presented an in-depth analysis of the system’s performance in terms of timing complexity and accuracy.
The key contributions of this paper are given as follows:
Creating a large dataset of ginger plant leaf images containing patterns of health, pests, nutrient deficiency, and soft rot disease.
Presenting ginger plant pest patterns, nutritional deficiencies, and soft rot disease identification by applying deep learning classification and detection models such as CNN, MobileNetV2, VGG-16, and YOLOv5.
Analyzing the performance of the proposed models in terms of time complexity and accuracy under different conditions.
Validating a deep learning-based detection platform that executes on smartphones in a real-time environment, generates identification results based on the given input, and recommends appropriate actions to the farmers.
The rest of the paper is organized as follows.
Section 2 reviews the related work.
Section 3 discusses the proposed methods, followed by obtaining results and discussion in
Section 4, and finally,
Section 5 concludes the paper.
2. Related Work
This section describes various research studies that were carried out to detect plant leaf disorders using classical and advanced deep learning methods.
The spread of crop diseases might harm the economy. Traditionally, crop disease diagnosis is made manually, which is time-consuming and lacks accuracy. The authors in [
20] performed image binarization, contour extraction, and fox fitting techniques using deep learning. This work utilized different deep learning models and achieved 73% accuracy. However, the adopted models showed poor performance on the other datasets. Moreover, the recommended model, however, requires a lot of computation during training to identify various diseases.
Identifying citrus plant diseases was addressed. The authors in [
21] employed K-means, classification, and neural network techniques. The study emphasized the traits, advantages, and disadvantages associated with citrus. The study also showed that new technologies would be needed to identify and categorize citrus plants in the future. Additionally, they mentioned that citrus plants’ automatic recognition and classification are still in their early stages.
In [
22], the authors developed pipelines based on fuzzy, support vector machine and neural network for plant leaf disease detection. Although the study achieved significant results, the authors highlighted the pros and cons of computer vision-based methods in plant disease detection. This study also suggested exploring new tools and techniques for disease identification at different stages.
For the task of detecting wheat leaf disease from finely grained image categorization in [
23], the authors employed improved CNN. The suggested model was implemented with many neurons, data sources, and connecting channels. The findings showed that VGG16 with AlexNet can achieve approximately 90% accuracy. Furthermore, the study highlighted that other models, such as GANs, would produce better results using a large dataset.
In [
24], the authors focused on the fusarium head blight disease that affects wheat crops. Images of wheat leaves are processed to identify the damaged area using CNN and image processing techniques. The model correctly identifies the crop’s sick regions in training with a mean average precision of 92%. The outcomes surpass those of k-means and Otsu’s techniques. To more accurately diagnose the unhealthy components, this approach needs vast datasets.
The CNN classification technique was exploited in study [
25] to identify the strengths and weaknesses of works that used a CNN to detect crop diseases. The study proposed developing a more balanced and reliable agricultural tool for food production.
In [
26], deep learning, machine learning, transfer learning, and deep convolutional neural network were applied. The proposed model successfully classified 38 different diseased classes. The results showed 96.46% accuracy by the proposed model. This accuracy exceeds standard machine learning techniques.
In [
27], the diseases and pests of corn crops were captured at early stages from the field. The images were segmented by using image texture-based and iterative clustering methods. Then, the obtained features from the segmented images were used in the classification process. Moreover, the classification was performed via a multi-class support vector machine (SVM). The results showed 52% accuracy for the pest detection problem. However, the most common pest attack on the corn crop is an aphid, which is not considered in this study.
In [
28], authors explored the application of deep learning to identify rice plant-associated diseases. This study demonstrated that ResNet-101, VGG-16, and YoloV3 are robust to blurred and irregularly shaped images. However, when creating frames, they cannot extract the right frames because the main features of frames are fewer, while new frames are spectra that result in a waste of computation resources.
In [
29], authors worked on identifying the ginger plant diseases at the initial stage. This work adopted traditional computer vision and image-processing techniques for leaf disorder identification. Farmers can capture plant leaves using the deployed system connected to a digital or web camera. Furthermore, image-processing techniques are used to determine the affected part. Farmers are informed of the disease type via a global system for mobile communications (GSM) interface. The relay then activates the device’s pump, releasing the appropriate medication to treat the affected plant’s condition. The implementation results show that SVM and k-means algorithms produce better results than traditional methods. However, the used dataset is insufficient to generalize the technique, and ginger diseases with pest attacks are not considered.
3. Materials and Methods
Figure 1 presents an overview of the research work conducted in this study. First, a dataset of ginger plant leaf images is collected and categorized into healthy, pest patterns, nutrient deficiency, and soft rot diseases. Then, data pre-processing and augmentation are performed to strengthen the dataset. This is followed by implementing various deep-learning models to train the processed dataset. Subsequently, the trained models are integrated into an android application that classifies various ginger plant leaf disorders in real time. In the next section, we discuss the proposed methodology in more detail.
3.1. Ginger Plant Dataset
The dataset was collected from a field located at the orchard of PMAS—Arid Agriculture University Rawalpindi, Pakistan. The location of the field is presented in
Figure 2. In this experiment, ginger crop digital images were collected with a total of 4394 images of destructive behavior in early and multiple stages. The digital images dataset consists of 3 categories, namely soft rot disease, pest attacks, and deficiency nutrients. A summary of the gathering of the images of the ginger plant is depicted in
Figure 3.
We took the photos 4–5 months after planting the plant (drill sowing at 50 cm). There were two small rows of ginger plant seeds from China and Thailand. The images were captured manually in the presence of pathologists using an Infinix Hot 9 mobile. The sample inputs for each category are depicted in
Figure 4.
During image acquisition, the following rules were considered:
The camera lens is kept at a distance of 30–45 cm;
We only target the affected part of the leaf;
We capture the top and back view of the affected part of a leaf.
3.2. Data Preparation
Image pre-processing plays a vital role in ginger plant disorder classification and identification tasks because the images are different in size and contain noise and blur. The images of the ginger plant were taken from one device, but there may be variability in image size (width and height) due to the difference in the camera’s distance from the plant’s leaves. Deep learning requires homogeneous images for better training and testing results. Therefore, pre-processing is necessary to eliminate the noise and other external factors before passing the dataset into the model. All images in this study were resized by the CV2 library and saved in .jpg format. After resizing, all images were renamed using a Python script. Then, images were converted into a NumPy array using the NumPy Python library for normalization. These arrays of size 150 and 150 were used as the input given to the models used for feature extraction and classification.
In the experiments, a data augmentation process is applied to improve the volume and variants of the dataset. In this study, data are augmented by ImageDataGenerator, which performs rotation, flipping, horizontal shift, width shift, and zoom features to increase the dataset size.
3.3. Classification Approaches
Deep learning is widely used in machine vision and pattern recognition domains [
30]. In contrast to traditional machine learning techniques, which require a largely manual process, deep learning-based methods can perform latent feature extraction autonomously [
31]. This study implemented well-known deep learning models such as CNN and VGG-16 for ginger plant disorder classification.
A widely used model in deep learning is CNN, which has an edge in image identification due to the enormous model capacity and detailed information produced by CNN’s core structure properties. It is a complex network structure that performs convolution operations. Due to CNN’s better feature extraction abilities, CNN-based classification networks have been adopted and are now the most frequently used model in classifying plant diseases and pests. Input, hidden, and output are the three levels of the CNN model. A tumbling convolution layer and a pooling layer in the CNN feature extraction phase typically follow a full connection layer and a sigmoid classification structure [
18].
We also implemented the VGG-16 model on ginger plant disease, nutritional deficiency, and pest attack detection. VGG-16 is a CNN architecture, which took place in the ILSVRC 2014, and has over 138 million parameters. The most significant feature of this architecture is that it never varies the convolutional layers, the padding and maximum pooling layers, which always use a 2 × 2 filter with stride 2, and the maximum pooling layer, which uses a 3 × 3 filter with stride 1. The VGG-16 architecture adheres to this design of convolution and maximum pooling layers. The last three fully connected (FC) layers are the last with sigmoid activation function and the first two with ReLu. The 416 × 416 image size is given to the input layer in this 16-layer design [
30]. Recent research has demonstrated the efficiency of the VGG-16 network in identifying the images of affected crops [
32]. In our experimental study, CNN models also demonstrated substantial results in classifying plant disease detection [
33,
34]. We selected these models for this experiment in prior investigations. We assessed and compared the model’s behavior to the acquired images in the field dataset of the ginger crop. We adopted the standard training strategy to train our dataset’s model layers. Weights were randomly initialized rather than using pre-trained weights in training the models. The training and validation datasets were trained by models using a 120-batch size with 42 epochs, whilst the “Adam” optimizer was trained with default values parameters using a 0.001 learning rate. The binary cross entropy function was deployed as the loss function throughout the training phase.
In addition, we also implemented a transfer learning approach (pre-trained network). A deep learning-based MobileNetV2 network was used in our images-based ginger crop dataset. Transfer learning is prevalent nowadays, and we used the MobileNetV2 model as a transfer learning approach. It was developed from MobileNetV1 [
31], with the addition of inverse residuals and linear bottleneck modules. The basis of the MobileNet architecture was depth-wise convolution [
31]. The model takes an input of 150 × 150 pixels image. The pre-trained weights were used in this study, and the other layers were frozen. The model provides the input layer, and the sigmoid function is utilized in the final/output layer. We deployed a pre-trained ImageNet model using the Keras library. The performance of the deployed model is evaluated and compared with the CNN and Vgg-16 models.
Table 1 details the hyperparameters used during implementation.
3.4. Detection Approach
The deep learning models are also very useful in detecting and localizing the affected part of the input image. In this regard, various state-of-the-art models are available in the literature. For instance, in this work, we tested one of the commonly used detection models, YOLOv5, for the pest pattern, nutrient deficiency, and healthy leaf detection. Furthermore, the model is also being tested for detecting soft rot disease and healthy leaves using the seed images collected in the field.
The working of the deployed model YOLOv5n is generally explained in the network architecture (see
Figure 5). The model mainly consists of three components, e.g., the backbone, neck, and head. CSPDarknet [
35] is used for feature extraction at the backbone. This is responsible for solving the issue related to the repetition of gradient information during the network training by fusing gradient changes with the feature map from start to finish. This resulted in reducing model the parameters while increasing the detection performance. In this part, two CSP structures (one residual and one non-residual) are used. In addition, spatial pyramid pooling (SPP) [
36] is also used to solve anchor and feature map alignment issues. The next part of the neck layer is the combination of FPN and PANet [
37]. This layer is responsible for feature fusion and performs multi-scale prediction across various layers. This helps in the enhancement of semantic representation and localization at different scales. Moreover, the CBL layers are further concatenated in the last step in order to extract the pixel information for mask formation. In the prediction part, the model uses a joint loss function using bounding box regression, classification, and confidence and is expressed as follows:
where
is used to represent the classification error,
represents the bounding box regression error, and
shows the confidence error.
The
is computed as follows:
where
takes either 1 or 0 for class objects and
and
show predicted and true probability. The
is computed as follows:
where it is also known as a generalized intersection over union (GIoU) localization loss. It is useful to locate a closed bounding box bounded by the predicted box
and the true box
. Here,
,
C, and
I are used for the desired area, the overlapping area, and the real area, respectively. The intersection ratio of the estimated and real area in the image frame is represented by
.
is calculated as follows:
depicts image partitioning into
grids, yielding m candidate anchors.
denotes negative or positive samples. The confidence levels of the
ith predicted bounding box and the true bounding box are represented by
and
, respectively. Moreover,
indicates a binary cross-entropy loss and is defined by:
3.5. Implementation
All the models were deployed under the framework “Keras”—a high-level Python interface for developing and deploying various neural network models [
9]. A high-speed GPU was used for the experiments.
Table 2 shows the system specification and configuration during the experiments.
After that, for detection purposes, the YOLOv5 model was deployed on a desktop platform where Ubuntu 20.04, Pytorch, and the YOLOv5 environment were deployed. Furthermore, Cuda 11.3 and Cudnn 8.2.0 are used in conjunction with a GeForce RTX 3080 Ti 32 GB on an Intel Core i7-12700K × 20. During the experiments, the “Adam optimizer” set the rounds of training for 40 epochs, a batch size of 64, and a learning rate of 0.004. In addition, the dataset was split into 90% training and 10% validation. In addition, we used a pre-trained model, the YOLOv5n version.
The adopted model is evaluated on our developed ginger dataset, which consists of leaf images of healthy, pest pattern, and nutrient deficiency, and the rhizome images of healthy and soft rot disease. We performed an experimental study on the ginger dataset in order to verify the applicability of the YOLOv5 models despite the previous classification models that are shown in our study.