Deep Learning in Precision Agriculture: Artificially Generated VNIR Images Segmentation for Early Postharvest Decay Prediction in Apples

Stasenko, Nikita; Shukhratov, Islomjon; Savinov, Maxim; Shadrin, Dmitrii; Somov, Andrey

doi:10.3390/e25070987

Open AccessArticle

Deep Learning in Precision Agriculture: Artificially Generated VNIR Images Segmentation for Early Postharvest Decay Prediction in Apples

by

Nikita Stasenko

¹

,

Islomjon Shukhratov

¹

,

Maxim Savinov

²

,

Dmitrii Shadrin

^1,3 and

Andrey Somov

^1,*

¹

Skolkovo Institute of Science and Technology, 121205 Moscow, Russia

²

Saint-Petersburg State University of Aerospace Instrumentation (SUAI), 190000 Saint-Petersburg, Russia

³

Department of Information Technology and Data Science, Irkutsk National Research Technical University, 664074 Irkutsk, Russia

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(7), 987; https://doi.org/10.3390/e25070987

Submission received: 5 May 2023 / Revised: 19 June 2023 / Accepted: 22 June 2023 / Published: 28 June 2023

(This article belongs to the Special Issue Application of Information Theory to Computer Vision and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Food quality control is an important task in the agricultural domain at the postharvest stage for avoiding food losses. The latest achievements in image processing with deep learning (DL) and computer vision (CV) approaches provide a number of effective tools based on the image colorization and image-to-image translation for plant quality control at the postharvest stage. In this article, we propose the approach based on Generative Adversarial Network (GAN) and Convolutional Neural Network (CNN) techniques to use synthesized and segmented VNIR imaging data for early postharvest decay and fungal zone predictions as well as the quality assessment of stored apples. The Pix2PixHD model achieved higher results in terms of VNIR images translation from RGB (SSIM = 0.972). Mask R-CNN model was selected as a CNN technique for VNIR images segmentation and achieved 58.861 for postharvest decay zones, 40.968 for fungal zones and 94.800 for both the decayed and fungal zones detection and prediction in stored apples in terms of F1-score metric. In order to verify the effectiveness of this approach, a unique paired dataset containing 1305 RGB and VNIR images of apples of four varieties was obtained. It is further utilized for a GAN model selection. Additionally, we acquired 1029 VNIR images of apples for training and testing a CNN model. We conducted validation on an embedded system equipped with a graphical processing unit. Using Pix2PixHD, 100 VNIR images from RGB images were generated at a rate of 17 frames per second (FPS). Subsequently, these images were segmented using Mask R-CNN at a rate of 0.42 FPS. The achieved results are promising for enhancing the food study and control during the postharvest stage.

Keywords:

GAN; CNN; precision agriculture; postharvest decay; fungi; image processing

1. Introduction

According to the data provided by United Nations, the human population has grown to 8 billion people [1], and it is expected to increase up to 9.8 billion by 2050 [2]. The growing population will need more sustainable and affordable food sources. It increases the importance of agriculture in the light of sustainable development. In terms of food producing and quality control, agricultural challenges can be divided into preharvesting, harvesting and postharvesting stages [3]. Each stage includes various factors that should be taken into account in order to minimize food losses. During the postharvest stage, farmers primarily concentrate on factors that impact the shelf-life of harvested products during storage and transportation. These factors include temperature [4], humidity [5], as well as the use of gases and chemicals in food containers [6,7]. Each crop has its own number of factors affecting the shelf-life during the postharvest stage, and these factors should be also taken into account [8]. Disparagement of one of these factors or violation during the storage or transportation may result in postharvest losses of food products. Examples of postharvest losses in stored fruits and vegetables include decayed and spoiled areas, often attributed to mishandling, hygiene issues, inadequate humidity control, improper temperature management, and mechanical damages [9]. These factors contribute to the deterioration and loss of quality of stored subjects.

Apple is one of the most popular harvested and cultivated crops. Its global production achieved 93 millions tonnes in 2021 [10]. It is one of the major reasons to monitor apple fruits quality during all the above-mentioned stages to prevent postharvest losses and to avoid potential economic losses. However, there are special factors affecting apple quality during the postharvest stage, e.g., water as loss in apple fruits [11], residual pesticides [12], or concentration of carbon dioxide, ethylene, ethanol or ammonia surrounding apples due to insufficient ventilation in the storage facility [13]. The most common non-destructive methods for preventing postharvest losses include the control of objects using RGB video cameras and sensors [14], near infrared (NIR) data [15], gas sensing spectroscopy [13], fluorescence spectroscopy [16], magnetic resonance imaging (MRI) [17], and even electronic nose [18]. Nevertheless, postharvest losses are still estimated in the range of 40–50% [9]. It should be noted that the control of apple fruits at the postharvest stage is quite comprehensive, making it difficult to monitor each fruit at each step, while any damage may lead to a fungi infection [19] in the stored fruits and also to the formation (and even a rapid growth) of rotten areas which are also known as decayed areas [20]. Moreover, these areas are not well seen visually at early stages, and the decay growth process can be quite dynamic [21].

Artificial intelligence (AI) and its domains, including machine learning (ML) and deep learning (DL), in conjunction with the latest achievements in computer vision (CV), remote sensing, wireless sensing technologies, and Internet of Things (IoT), have provided the added value in a number of application including the space domain [22], medicine [23], power engineering [24], agriculture [25] and food supply [26]. For example, farmers rely on CV for crop quality management, e.g., plant growth monitoring [27], fruit detection [28], disease detection [29] and weed detection [30]. It is necessary for improving the food quality of each plant at preharvest, harvest, and postharvest stages, respectively. Also, there is a set of CV-based approaches for postharvest losses estimation and the evaluation in stored crops [31,32,33]. However, some postharvest losses, e.g., fungi or postharvest decay zones, should be detected immediately, since the visible decayed or fungi zones (acquired visually or with RGB cameras and sensors) in stored plants may indicate their serious spoilage if we use other types of imaging data, e.g., NIR or thermal imaging, to monitor their quality. This monitoring process requires a special device and equipment, e.g., multispectral or hyperspectral cameras, which are expensive and often not easy to use, given fast detection of defects is still extremely challenging.

In this article, we present an approach based on the application of generative adversarial network (GAN) and convolutional neural network (CNN) for early detection and segmentation of decayed and fungi areas in stored apples at the postharvest stage using visible near-infrared (vis-NIR, or just VNIR) imaging data. We show how artificially generated VNIR imaging data can be used for early postharvest decay detection in stored apples and examine whether GAN- and CNN-based approaches can achieve promising results for image segmentation tasks. The idea of the proposed approach can be divided into two parts:

Generation of VNIR imaging data containing the stored apples with postharvest decay and fungi zones using the GAN technique.
Segmentation of generated VNIR images using the CNN technique in order to detect the decayed and fungi zones in the stored apples.

In this research, we study the original and generated VNIR images containing apples of four varieties with several treatments in order to simulate various occasions with apples during the storage. The aim is to present an approach based on the DL techniques combining the GAN and CNN models, for instance, with segmentation of postharvest decay zones and fungi areas. The GAN model will provide the procedure of NIR images synthesis from the input RGB data, while the CNN model is supposed to be used for the instance segmentation of generated images. This is important for the proposed approach, as we aim to train and validate our models to detect the postharvest decay zones and fungi areas separately from each other. For realizing this idea into practice, we propose the following stages.

First, we need to select a GAN based model for the NIR images generation from the input RGB data. There are many available networks, but for the image-to-image translation tasks the following architectures Pix2Pix [34], CycleGAN [35], and Pix2PixHD [36] are mostly applied in agricultural domain [37,38,39,40,41,42,43]. We compare Pix2Pix, CycleGAN, and Pix2PixHD models using the dataset containing the paired RGB and NIR images. We are going to work with the images acquired in VNIR range since it includes the full visible spectrum with an abutting portion of the infrared spectrum [44]. The paired images collected in the visible (380–700 nm) and VNIR (400–1100 nm) ranges are required to make sure that the decayed and fungal traits in stored apples are the same for these two ranges. Section 3.1.1, Section 3.1.2, and Section 3.1.3 provide detailed information about the Pix2Pix, CycleGAN, and Pix2PixHD models, respectively.

Second, it is necessary to choose the CNN model for the decayed and fungal areas segmentation in the synthesized VNIR images. In this work, we implement a Mask R-CNN model due to the Feature Pyramid Network (FPN) and ResNet101 backbone, which allow for generating the bounding boxes (object detection) and segmentation masks (instance segmentation). In [45], we have compared the Mask R-CNN to such applied CNN-based models as U-Net [46] and Deeplab [47] for early postharvest decay detection, and Mask R-CNN achieved the highest performance in terms of average precision, namely 67.1% against 59.7% and 56.5%, respectively. Moreover, the Mask R-CNN model generates the bounding boxes and segmentation masks of the postharvest decay and fungal zones separately from each other. This is a so-called ‘a tried and tested’ method, and that is why we use Mask R-CNN as a CNN-based segmentation model. We discuss the Mask R-CNN model in more detail in Section 3.1.4.

Finally, our plan is to implement the proposed approach and execute it on a Single Board Computer (SBC) with the AI capabilities. This implementation will serve as an evaluation platform for generating segmented VNIR images that highlight any postharvest decay and fungal zones on apples. These zones may be imperceptible to the human eye, but can be detected and selected through our system. We use NVIDIA Jetson Nano as an embedded system with AI capabilities for evaluation. It is a compact and powerful SBC supplied with the accelerated libraries for computer vision and deep learning applications, and is widely used for different real-time problems in agriculture including weed control [48], soil mapping in greenhouse [49], and harvest product detection [50,51,52,53,54]. That is why the presented research is supposed to be an alternative solution for the high-cost NIR hyperspectral devices used for the early postharvest decay detection and prediction for stored food. Figure 1 illustrates the proposed approach.

The contribution of this work is as follows:

Two experimental testbeds for paired RGB and VNIR imaging data collection under various environmental (temperature and humidity) conditions.
Application of CNN models, for instance, on the segmentation of decayed and fungi areas in apples at the postharvest stage.
Separate segmentation of fungi zones and postharvest decay areas in stored apples using the CNN model.
Application of the trained CNN-based model for the instance segmentation of postharvest decay zones and fungi areas in VNIR images generated by the GAN-based model.
Implementation of the proposed approach based on the GAN and CNN techniques for postharvest decay detection, segmentation and prediction using generated VNIR imaging data on a low-cost embedded system with the AI capabilities.

This article is organized as follows: Section 2 provides an introduction to relevant research works aimed at early postharvest decay detection and prediction in apples using RGB and VNIR imaging data with the CV and ML methods. Section 3 presents the methods used in this work. Section 3.3 demonstrates the experimental testbeds used for RGB and VNIR imaging data collection and describes the procedure of data annotation. Section 4 shows the results of the comparison of the GAN techniques applied to VNIR images generation from the RGB ones (see Section 4.1). It also presents the application of the CNN technique, for instance, on the segmentation on the generated VNIR images (see Section 4.2), and describes the embedded system running the proposed GAN and CNN (see Section 4.3). Conclusions and discussion of the future work are summarized in Section 5.

2. Related Works

2.1. CV Approaches Based on CNN Models Using RGB Imaging Data

CV techniques with the implementation of ML and DL methods are becoming one of the most useful tools for fruit quality estimation and evaluation at the postharvest stage.

The majority of approaches are based on the collection and analysis of visible morphological traits, such as changes in fruit shape, size, or color during the storage, from stored fruits with CNN models using RGB images as the most acceptable and user-friendly type of data. RGB imagery is closely similar to human vision because red, green and blue are the primary colors in these color models, which makes the process of visible non-destructive quality monitoring and defect detection of stored food production easy and understandable [55]. The majority of cameras and devices for RGB imaging data collection contain a patterned Bayer filter mosaic consisting of squares of four pixels with one red, one blue and two green filters [56]. Usually, the Bayer filter is located on the camera chip.

Generally, a CNN model contains convolutional and pooling layers (added one by one), flatten, fully connected layer and softmax classifier. The convolutional and pooling layers are used in the features extraction part, while the classification part involves the flatten, fully connected layers and softmax classifier. When the image reaches the input layer, a filter in the convolution layer allows it for the selection of feature neurons. An activation function (Sigmoid, Rectified Linear Unit (ReLU), or Softplus) is added to obtain nonlinear results by passing feature neurons through it, and the resulting feature map size is reduced by the pooling layer functions. The flatten layer is the first input layer for the classifier model as it keeps the feature map from the convolution layers. The fully connected layer transforms the obtained feature neurons into a matrix, which performs the classification function with a classification method.

In this way, the CNN structure showed its efficiency in classification, and then in detection and segmentation tasks using RGB imaging data. For example, the automated banana grading system was reported in [57] where a fine-tuned VGG-16 Deep CNN model was applied for banana classification using such traits as skin quality, size, and maturity with the acquired RGB imagery data. A similar approach was proposed in [58] where the VGG-16 model was trained to predict the date of the fruit ripening stage using RGB images with an overall classification of 96.98%.

In [59], the authors developed an automated online carrot grading system, where a lightweight carrot defect detection network (CDDNet) based on ShuffleNet [60] and transfer learning was implemented for carrot quality inspection using RGB and grayscale images. The CDDNet was compared to other CNN models including AlexNet, ResNet50, MobileNet v2, and ShuffleNet, and it demonstrated good performance in terms of detection accuracy and time consuming for binary classification of normal and defective carrots (99.82%), and for classification of normal, bad spots, abnormal, and fibrous root carrots (93.01%). However, the images of carrots contained the carrots of different size and appearance, and the idea of the presented approach was to detect the carrots with visible defects without taking into account the spoilage stage of the defective carrots. Moreover, there was no mention of a possible situation when the carrots are infected, but still there are no visible traits of spoilage.

In [61], the authors report on the implementation of the DeeplabV3+ model [62] with a classical image processing algorithm, e.g., threshold binary segmentation, morphological processing and mask extraction for banana bunches segmentation during sterile bud removal (SBD) on the total of 1500 RGB images. Moreover, YOLOv5-Banana model [63] for the banana fingers segmentation and centroid points extraction, while edge detection and centroid extraction of banana fingers included binarization, morphological opening operation, canny edge detection, and extracting centroid point set. DeeplabV3 was reported to achieve a detection accuracy rate of 86%, mean intersection over union (MIoU) of 0.878 during the debudding period for target segmentation, and the mean pixel precision of 0.936. YOLOv5-Banana achieved 76% detection accuracy rates for the banana bunches during the harvest period. The authors also designed and presented the software to estimate the banana fruit weight during the harvest period.

In [64], several CNN-based models including VGG-16, VGG-19, ResNet50, ResNet101, and ResNet152 were compared to each other for such physiological disorders classification in stored apples as bitter pit, shriveling, and superficial scald. The authors acquired a dataset containing 1080 RGB images (dataset-1) of apples and 4320 augmented images (dataset-2) with the aim to improve data representation during model training and to consider apple position under the monitoring camera and lighting conditions during the storage. The CNN-based models were used and compared for feature extraction, while such classical ML methods as support vector machines (SVM), random forest (RF), k-nearest neighbors algorithm (kNN), and XGBoost were used for the extracted features classification. The highest average accuracy was reported for the VGG-19 model in conjunction with the SVM method in the dataset-1 and dataset-2 with 96.11 and 96.09%.

2.2. Machine Learning and Deep Learning Methods for NIR Data Analysis

NIR spectroscopy covers spectral regions from 780 to 2500 nm that cannot be seen with human eyes, but it allows for obtaining spectral information from ten (generally, referred to as multispectral data [65]) and to more than a hundred wavebands (referred to as hyperspectral data [65]). Measurements performed in the visible (380–700 nm), visible near-infrared (vis-NIR, or just VNIR, 400–1100 nm), and NIR (780–2500 nm) ranges provide the user with more detailed information on the chemical composition of scanned samples. In our case, by samples we mean stored plants, crops and fruits. The state-of-the-art cameras and devices for the hyperspectral data acquisition provide not only spectral information about the scanned samples, but also allow the users to obtain the images of scanned zones in the range of device bands. Spectral information on chemical composition from a wide range of wavebands has simplified the procedure of food quality monitoring and defect detection at the postharvest stage. Moreover, not only the decay zones may occur in stored fruits, but also some fungi like Sclerotinia sclerotiorum [66], Penicillium expansum [67], Botrytis cinerea [68], Botryosphaeria dothidea [69] and many others, which should be immediately detected at the early stage. Otherwise, the appearance and growth of decayed and fungi zones may lead to the loss of all stored fruits. It is vital to distinguish various types of postharvest losses, e.g., postharvest decay, and diseases, e.g., various fungi varieties, since each type of loss requires a special type of treatment or removal of spoiled samples from the storage. It should be noted here that the formation of fungal areas may not always lead to the formation of decayed areas. That is why we should detect and identify the fungi and postharvest decay zones separately from each other [70,71,72].

Both classical ML methods and the DL techniques based on the CNN models are widely used for postharvest losses evaluation in stored plants using VNIR and NIR imaging and spectral data.

In [73], the authors compared several ML methods including linear discriminant analysis (LDA), random forest (RF), support vector machines (SVM), kNN, gradient tree boosting (GTB), and partial least squares-discriminant analysis (PLS-DA) for early Codling Moth zones detection in “Gala”, “Granny Smith”, and “Fuji” stored apples. The research was carried out at the pixel level using NIR hyperspectral reflectance imaging data in the range of 900–1700 nm with an optimal selection of wavelengths. GTB was reported to obtain better results at a pixel level classification with 97.4% of total accuracy for validation dataset.

In [74], the authors implemented the AlexNet model for detecting pesticide residues in postharvest apples using hyperspectral imaging data. There were 12,288 hyperspectral acquired images for the training set and 6144 images for the test set in the 865.11–1711.71 nm range (the camera included 256 bands) and with 3.32 nm spectral resolution. Otsu segmentation algorithm [75] was used for the apples and pesticide residue positioning (they were the regions of interests, or just ROIs), while deep AlexNet [76] provided pesticide category detection. AlexNet was reported to show better results in terms of detection accuracy and time consumption in comparison to the SVM and kNN algorithms (99.09% and 0.0846 s against 74.34% and 11.2301 s, and 43.75% and 0.7645 s, respectively).

As we can see, NIR hyperspectral and multispectral imaging data ensures early disease detection with more details than RGB imaging, but also requires sophisticated equipment, which usually includes a camera with wavebands, imaging spectrograph (or spectrometer), sample stage, illumination lamps and lightning system, as well as supplementary software and devices for processing and capturing NIR data and images [77,78,79]. However, this is the reason why hyperspectral imaging devices are so expensive and may cost from thousands to ten thousand USD [80]. These high prices reduce the availability and usage of hyperspectral cameras for farmers and food selling companies to perform food quality control at postharvest stages. This issue has raised a demand for developing new approaches for NIR imaging data generation without using high cost hyperspectral systems.

2.3. GAN-Based Models for RGB and NIR Data Analysis

Generative Adversarial Networks (GANs) and, in particular, conditional GAN (cGAN) [81] have demonstrated their effectiveness in a variety of tasks in the agricultural domain including remote sensing [82], image augmentation [83], animal farming [84], and plant phenotyping [85]. The general idea of GAN is based on the usage of two neural network models, where the first network is called generator (generative part, G) and its goal is to create plausible samples, while the second network is called discriminator (adversarial part, D), and it learns to verify whether the created plausible sample is real or fake. GANs are also applied for the so-called image-to-image translation tasks, i.e., where there is a need for high-quality image synthesis from one domain to another. For example, GAN-based models were successfully applied for the multi-channel attention selection in the RGB imagery considering an external semantic guidance in [86,87], MRI data estimation in [88], diffusion models evaluation [89], and NIR imaging generation from the input RGB images in [82,90,91].

Therefore, the approaches based on GAN models allow synthesizing high-quality NIR images from the input RGB images while saving detailed spectral information. At the same time, it is crucial not only to transform the image together with all the relevant information, but also to segment various types of postharvest diseases and defects separately from each other in stored food production in order to choose the specific processing strategy for defected or spoiled food samples. At present, most GAN models provide only the images transformation from one domain to another, but not object detection or instance segmentation operations in the synthesized images. However, as shown in Section 2.1, CNN models demonstrate reasonably good results for the object detection and instance segmentation both for the RGB and the NIR images.

3. Materials and Methods

3.1. DL Techniques

3.1.1. Pix2Pix

The Pix2Pix model [34] is a type of cGAN that has been demonstrated on a range of image-to-image translation tasks, such as converting a satellite image to corresponding maps, or black and white photos to color images. In conditional GANs, the generation of the output image is conditional on the input image. In the case of the Pix2Pix model, the generation process is conditional on the source image. The discriminator covers both the observed source image (domain A) and the target image (domain B) and must determine whether the target is a plausible transformation of the source image. The generator is trained via the adversarial loss which encourages the generator to make plausible images in the target domain. The generator is also updated via

L_{1}

loss measured between the generated image and the expected output image. This additional loss encourages the generator model to create the plausible translations of the source image. Mathematically, the whole process in Pix2Pix can be defined as:

L_{c G A N} (G, D) = E_{x, y \sim p_{d a t a (x, y)}} [l o g D (x, y)] + E_{x, z \sim p_{d a t a (x, z)}} [l o g (1 - D (x, G (x, z))]

(1)

where G is the generator, D is the discriminator, x is the observed image, y is the target image, z is the random noise vector, and

λ

controls the relative importance of the two objectives between domain A and domain B. The following objective function is used to train the model:

G = a r g min_{G} max_{D} L_{c G A N} (G, D) + λ L_{L 1} (G)

(2)

Pix2Pix requires perfectly aligned paired images for the training procedure. In this research, the CNN-based architecture is used both as the generator and the discriminator. Generally, the U-Net model [46] is applied in Pix2Pix as a generator. U-Net trains to generate the images from the images in domain A similar to the images in domain B. The discriminator is usually a PatchGAN (which is also known as Markovian discriminator [92]), and it trains simultaneously to distinguish the generated images from the real images in domain B. The reconstruction loss measures the similarity between the real images and the generated images. Figure 2 shows the block diagram of Pix2Pix.

3.1.2. CycleGAN

The goal of the CycleGAN model [35] is to learn the mapping

G : X \to Y

such that the distribution of images from

G (X)

is indistinguishable from the distribution Y using an unpaired set of image pairs. This mapping is coupled with an inverse mapping

F : Y \to X

and a cycle consistency loss introduced to enforce

F (G (X)) \approx X

and vice versa due to the reason that it is highly underconstrained. For the mapping function

G : X \to Y

and its discriminator

D_{Y}

L_{G A N} (G, D_{Y}, X, Y) = E_{y} [l o g D_{Y} (y)] + E_{x} [l o g (1 - D_{Y} (G (x)]

(3)

and the objective is as follows:

G, F = a r g min_{G, F} max_{D_{X}, D_{Y}} L (G, F, D_{X}, D_{Y})

(4)

CycleGAN learns a translation mapping in the absence of aligned paired images. The image generated from domain A to domain B by the CNN-based generator (

G_{1}

) is converted back to domain A by another CNN-based generator (

G_{2}

), and vice versa, in the attempt to optimize the cycle-consistency loss in addition to the adversarial loss. The block diagram of CycleGAN is shown in Figure 3.

3.1.3. Pix2PixHD

The Pix2PixHD model [36] is a modification of the solution realized in the Pix2Pix model, which includes several improvements including the Coarse-to-Fine generator, multi-scale discriminators, and improved adversarial loss. Pix2PixHD generally consists of global generator

G_{1}

and local enhancer

G_{2}

(see Figure 4, where *** are referred to the residual blocks). Throughout the training process, the global generator is initially trained, followed by the training of the local enhancer in a progressive manner based on their respective resolutions. Subsequently, all the networks are fine-tuned jointly. The purpose of this generator is to efficiently combine global and local information for the task of image synthesis. Three discriminators are used for effective detail capturing on multiple scales.

A significant performance boost was provided by the loss modification and two extra terms,

L_{F M}

-feature matching loss and perceptual loss, were added

L_{V G G}

[93] as objective functions. The feature matching loss performs the stabilization of the training. It happens due the point that the generator has to produce natural statistics at multiple scales:

L_{F M} (G, D_{k}) = λ_{F M} E_{y, x} \sum_{i = 1} \frac{1}{N_{i}} [| | D_{k}^{(i)} (y, x) - D_{k}^{(i)} (y, G (y)) {| |}_{1}]

(5)

where

D_{k}^{(i)}

denotes the output of the i-th layer of the

D_{k}

discriminator.

L_{V G G} = λ_{V G G} E_{y, x} \sum_{i = 1} \frac{1}{M_{i}} [| | F^{(i)} (x) - F^{(i)} {(G (y)) | |}_{1}]

(6)

where

F^{(i)}

denotes the i-th layer with

M_{i}

elements of the VGG network.

3.1.4. Mask R-CNN

Mask R-CNN [94] is a CNN-based architecture that provides the instance segmentation of various objects in the images. These objects in images are usually called the Regions of Interest (ROIs). This is the latest version of the R-CNN model [95], where R-CNN stands for Regions detected with CNN. Firstly, R-CNN has been improved to Fast R-CNN [96], then to Faster R-CNN [97], and, finally, to Mask R-CNN. As it was mentioned earlier, in R-CNN based models the ROIs are detected with the CNN feature’s selective search. In Mask R-CNN, this selective search was improved to Mask R-CNN by adding the Region Proposal Network (RPN) in order to initiate and identify the ROIs and by adding a new branch for the prediction of the mask that covers the found region, i.e., an object in the image. The RPN and ResNet101 backbone allow for making the object detection (bounding boxes generation) and instance segmentation if there are several ROIs in one image and they have different sizes and partially overlap each other. Figure 5 presents a block diagram of Mask R-CNN architecture.

3.2. Performance Metrics

In this study, we compare the original VNIR images with the VNIR images generated by the Pix2PixHD model. To perform this, we considered the Mean Average Error (MAE), Mean Average Percentage Error (MAPE), Mean Squared Error (MSE), Root Mean Square Error (RMSE), Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Feature Similarity Index Measure (FSIM) as follows:

MAE = \frac{1}{n} \sum_{i = 1}^{n} |(y_{i} - x_{i})|

(7)

MAPE = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{(y_{i} - x_{i})}{y_{i}}|

(8)

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - x_{i})}^{2}

(9)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - x_{i})}^{2}}

(10)

PSNR = 10 {log}_{10} (\frac{R^{2}}{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - x_{i})}^{2}})

(11)

SSIM = [{l (x_{i}, y_{i})}^{α} \cdot {c (x_{i}, y_{i})}^{β} \cdot {s (x_{i}, y_{i})}^{γ}]

(12)

FSIM = [{S_{P C} (x_{i}, y_{i})}^{α} \cdot {S_{G M} (x_{i}, y_{i})}^{β}]

(13)

where

y_{i}

is the generated or synthesized image,

x_{i}

is the original image, n is the number of observations, R is the image maximum possible pixel value, l is the luminance, c is the contrast, s is the structure,

α

,

β

, and

γ

are the weights,

S_{P C}

is the invariant to light variation in images, and

S_{G M}

is the computation of image gradient.

We used precision, recall, mean Intersection over Union (IoU), mean Average Precision (mAP), and F1-score to verify the efficiency of the Mask R-CNN model on the synthesized VNIR pictures during the training and validation stages, which are defined as follows:

Precision = \frac{T P}{T P + F P}

(14)

Recall = \frac{T P}{T P + F N}

(15)

IoU = \frac{A r e a o f O v e r l a p}{A r e a o f U n i o n}

(16)

A P = \sum_{n} ({R e c a l l}_{n} - {R e c a l l}_{n - 1}) {P r e c i s i o n}_{n}

(17)

F 1 - score = 2 * \frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

(18)

Precision and recall are based on True Positives (

T P

), True Negatives (

T N

), False Positives (

F P

), and False Negatives (

F N

).

T P

denotes instances in which the model correctly predicts a specific object from a given class in images,

T N

denotes the instances in which the model correctly predicts an object that does not belong to a given class, and

F P

denotes the instances in which the model predicts a specific class, but the object does not actually belong to that class. In contrast,

F N

are the cases in which the model makes no prediction of a particular class, but the object actually belongs to one of the classes. The object classes are described in Section 3.4.

The AP is a region that lies beneath the precision–recall curve. The weighted mean of precisions at each IoU threshold, with the increase in recall from the preceding threshold as the weight, is how AP summarizes a precision–recall curve. It is calculated using (17), where

{P r e c i s i o n}_{n}

and

{R e c a l l}_{n}

are the Precision and Recall at the n-th IoU threshold.

The mAP over all classes or overall IoU thresholds is calculated with the mAP score.

A P

is averaged over all the classes. There is no distinction between AP and mAP in this case. In our scenario, since

A P

is averaged across all the classes, there is no difference between AP and mAP. We calculated AP values for IoU = 0.50 (

{A P}_{50}

), for IoU = 0.75 (

{A P}_{75}

), for the objects with an area less than 32 squared pixels (

{A P}_{S}

), for the objects with an area ranging from 32 to 96 squared pixels (

{A P}_{M}

), and for the objects with an area higher 96 squared pixels (

{A P}_{L}

).

3.3. Experimental Testbeds and Data Acquisition

In this section, we describe the apple fruits used for the experiments and present experimental testbeds for data collection:

(i) The experimental testbed for acquiring the dataset containing paired RGB and VNIR images of stored apples;

(ii) The experimental testbed for stored apple VNIR images collection containing VNIR images acquired by a multispectral camera.

The first testbed is designed for paired RGB and VNIR images collection in order to train and validate the GAN-based DL models for VNIR images translation from RGB images (see Section 3.3.1). The second testbed is used for the stored apples VNIR images collection as well as for the CNN-based model training and validation of postharvest decay zones detection and segmentation in the generated VNIR images (see Section 3.3.2).

3.3.1. Experimental Testbed for Paired RGB and VNIR Imaging Data Collection

We selected 16 apples of four kinds (“Delicious”, “Fuji”, “Gala”, “Reinette Simirenko”) and divided them into four rows according to their kind (each row corresponds to each apple kind). Each row contained four apples of different types, where every apple has different treatment from left to right: an apple with no treatment, a thoroughly washed and wiped apple, a mechanically damaged apple, and a shock-frozen apple supercooled under

- 20^{\circ}

, respectively. The apple without treatment serves as a reference for each kind. A thoroughly washed apple indicates the removal of the natural protective wax layer from an apple. A mechanically damaged apple imitates the wrong storing conditions. A shock-frozen apple simulates the wrong storing conditions. Figure 6 shows these apples.

The first testbed is used for data collection under the recommended room storage conditions. The temperature ranges from 25 °C to 32 °C and Relative Humidity (RH) of 34% [98]. The testbed contains aluminum frames and is 1 m in length, 1 m wide, and 1.7 m high. Apples lie on a table with a white tray at the height of 1.3 m above the floor level. We also use SLR camera Canon M50 and the multispectral camera CMS-V1 CMS18100073 (CMS-V) attached at the middle top of the frame and connected to a PC laptop via the USB hub. The distance between the table with the apples on top and the camera is 500 mm. The lamps allowed us to simulate real storage conditions for apples as well as perform the collection of images under full and partial illumination. Detailed information about the acquired dataset and the first experimental testbed is described in [99]. Figure 7 shows the first testbed.

The multispectral camera CMS-V allows acquiring images in the range of 561–838 nm, including the visible and NIR ranges. This camera imager is characterized by the modified Bayer matrix made of a group of 3 × 3 pixels, called macro-pixel, filtering 3 × 3 (9) spectral bands. The raw image delivered by the camera is built of 9 interleaved spectral sub-images (8 colors + 1 Panchromatic) with the 1280 × 1024 pixels resolution. Each RGB image relates to 9 images from the following spectral bands channel0 = 561 nm, channel1 = 597 nm, channel2 = 635 nm, channel3 = 673 nm, channel4 = 724 nm, channel5 = 762 nm, channel6 = 802 nm, channel7 = 838 nm, and channel8 (panchromatic channel) = 0 nm. The resolution of the nine sub-images is 426 × 339 pixels.

We acquired 1305 sequential RGB images and 1305 corresponding VNIR images in 838 nm range to see the decay dynamics in presented apples. The examples of images are shown in Figure 8.

3.3.2. Experimental Testbed for VNIR Imaging Data Collection

In this experiment, we selected 22 apples of the “Alesya”, “Fuji”, “Golden” and “Reinette Simirenko” seasonal types for data acquisition. The apples were between 8 and 10 cm in diameter, and most of them were multicolor with red and yellow sections. There were also some apples containing fungi zones, i.e., grey-brown moldy areas in apples, as the examples of apples stored under violated storage conditions. These apples were used in order to increase the data representation for early postharvest decay detection tasks in the stored apples using VNIR imaging data. These apples are demonstrated in Figure 9.

The second testbed presented in Figure 10 is a greenhouse that includes silicon frames and five shelves, a plastic wrap, a multispectral camera, 10 LED strip lights with red/blue diodes, a power supply (total power is 150 Watt) for controlling the LEDs, a logger, and a pallet with apples. It can be used for the simulation of different processes related to plant breeding in various environmental conditions including extremely dry or wet modes. Temperature and humidity regulation in the testbed is provided with the LED strip lights, the plastic wrap, and several water pallets located on three lower bottom separate shelves.

The silica frames are the basic elements of a presented greenhouse characterized by the following dimensions 170 cm in height, 48 cm in length, and 67 cm in width. Two strip lights were fixed on each shelf while the multispectral camera and the pallet with the apples were fixed on the separate shelves (see Figure 10). Each selected strip has 60 LEDs with the wavelength of 650–660 nm (red light LEDs) and 455–465 nm (blue light LEDs) for highest chlorophyll concentration in plants to provide the most effective photosynthesis processes. This is also fair for crops and plants at the postharvest stages [100]. It is necessary to keep the quality of plant production which is another reason why these LED strip lights are used in the greenhouse. We rely on the power supply (12 V DC, 150 W, IP33) as the energy source for the SMD 5050 LED strip lights, and GL100-N/GL100-WL logger by Graphtech Corporation, supplied with the GS-TH sensor module, for temperature and humidity values registration during the data collection process.

For the VNIR image capturing, the multispectral camera CMS-V described in Section 3.3.1 was also chosen. The camera was connected via USB-A wire to the HP EliteBook 820 G3 Laptop with IntelCore i3-6100 CPU 2.30 GHz, where all the images were acquired and saved as JPG-files with 426 × 339 pixels.

We obtained 1029 sequential VNIR images in the 838 nm range collected from CMS-V camera’s channel7. These images were acquired under the temperature range from 35 °C to 40 °C and RH equal to 70% with the goal to simulate potential violation of the storage process of selected apples. This violation is necessary to speed up the decay processes in apples. We also collected 100 sequential RGB images (see the example in Figure 11) for the CNN-based model training and validation with the aim to demonstrate the up-to-date approach based on the combination of pre-trained GAN-based and CNN-based models. RGB sequential images had the dimensions of 339 pixels × 426 pixels × 3 channels (or simply 339 × 426 × 3).

3.4. Data Annotation

In order to apply a CNN-based deep learning model for the image instance segmentation, we used the Supervisely Ecosystem [101] for annotation and labeling of VNIR imaging data. It is worth reiterating here that we provide this labeling only for the VNIR images acquired with the testbed, described in Section 3.3.2 as these images were specially collected as the sequential VNIR imaging dataset for the DL model training and validation on early postharvest decay detection and segmentation of apples.

Four classes of objects in the images are defined as: Healthy apple, Decay, Fungi, and Spoiled apple. By the Healthy apple we understand the apples without any visible damages or spoiled zones in the images. The dark gray colored areas with the postharvest decay in apples were indicated as Decay. By Fungi we indicate white colored moldy zones in apples. Here we distinguish the postharvest decay zones marked as the Decay class, and moldy zones marked as the Fungi class. If an apple has objects of the Fungi class, it means that this apple is supposed to have been stored under the violated storage conditions, e.g., extreme temperature or humidity, which resulted in the apple’s full spoilage. The apples with only the postharvest decay zones (Decay) can be sent for recycling, while apples with moldy zones (Fungi) must be removed from others in order to prevent the spoilage of all samples. We also defined the Spoiled apples class: there are stored apples with more than 50 percent of spoiled areas (Decay objects) or moldy zones (Fungi objects) coverage. Figure 12 illustrates the procedure of image annotation.

4. Results and Discussion

4.1. Image-to-Image Models Comparison for VNIR Images Generation from RGB

In this section, we show the results of deep learning models based on generative adversarial networks comparison for VNIR images translation from RGB images. We provide this comparison on the dataset sequential RGB images and corresponding VNIR images in the 838 nm range presented in Section 3.3.1. To estimate the performance, we split the data into the train set (80%) and the validation set (20%). The augmentation techniques as Random Rotations, Shifts, Zoom, and Flips are implemented to increase the data representativity and to keep the model’s efficiency during the training and validation stages. We do not use the transformations such as Contrast/Brightness adjustments because they may lead to the information loss from the acquired VNIR imaging data. Taking into account that the image-to-image translation is also known as the translation from the domain B to domain A (or just BtoA), it was necessary to label domain B and domain A images from our acquired paired dataset. We identified the RGB images as domain B and domain A as the VNIR images. All models were evaluated by 200 epochs where the first 100 were implemented with the constant learning rate and the remaining 100 with linearly decreasing to zero. The models training and validation were realized via the Python scripts launched in Google Colab.

For the CycleGAN model, we use ResNet encoder–decoder architecture consisting of two downsampling layers, six ResNet bottleneck blocks and two upsampling layers. We also employ an Adam optimizer with the learning rate of 0.0002 and momentum parameters

β_{1}

= 0.5 and

β_{2}

= 0.999.

For the Pix2Pix model training, we fixed the same parameters: batch size = 1,

β_{1}

= 0.5,

β_{2}

= 0.999, and learning rate = 0.0002. The U-Net generator had 4 downsampling blocks. Optimization included the generator loss optimization step and the discriminator loss optimization step, respectively. Regularization parameters are as follows:

λ_{V G G} = λ_{F e a t} = 10

,

λ_{L_{1}} = 100

.

For the Pix2PixHD model, we also implement the same parameters: Adam optimizer, batch size = 1,

β_{1}

= 0.5,

β_{2}

= 0.999, and learning rate = 0.0002.

Figure 13 shows the discriminator values of CycleGAN (Figure 13a), Pix2Pix (Figure 13b), and Pix2PixHD (Figure 13c) models during the training stage. We show the model’s discriminator losses because they show the ability of GAN-based models to identify the quality of synthesized VNIR images by generator in comparison to original VNIR images.

For selected GAN-based models we see that the training stage is unstable, but the discriminator losses tend to decrease over time. Pix2PixHD shows the lowest loss value in comparison to CycleGAN and Pix2Pix. For the models validation, we reconstructed the VNIR images using model weights acquired during the training. We used MAE, MAPE, MSE, PSNR and SSIM metrics to estimate the quality of VNIR reconstructed images in comparison with original VNIR images. Figure 14 shows these images (with ‘cyclegan’, ‘pix2pix’, ‘pix2pixHD’ labels, respectively) in comparison to the original VNIR image (‘reference’ label) via Python visualization tools.

Table 1 summarizes the results of considered models performance, where the results for Pix2PixHD model are highlighted with the black blod. Considering both the pixel-based and the image metrics, one can conclude on the promising results. The generated images look more or less similar to the original ones. The images containing apples, overall light intensity similar to the ground truth and the decay region are mainly preserved. However, all the models have particular artifacts. The CycleGAN model has the big stamp-like artifacts and there are a lot of missed decayed zones in the apples. In terms of metrics mentioned in Section 3.2, Pix2Pix and Pix2PixHD models perform the comparable and much better than others, and decay regions preserved relatively well, although the intensity level mismatch can be seen. Pix2PixHD models produce perceptually good images preserving importance for task features and the mean error level is equal to 0.6%. In terms of important metrics for the image quality estimation, such as PSNR and SSIM, the Pix2PixHD model showed higher values in comparison to Pix2Pix (46.859 against 46.433, and 0.972 against 0.955, respectively). Taking into account the results of this comparison, we decided to use the Pix2PixHD model for VNIR images generation from RGB during the next stages.

4.2. Segmentation of Generated VNIR Images for Early Postharvest Decay Detection in Apples

In this section, we apply the CNN-based models for instance segmentation of generated VNIR images. Based on the results reported in Section 4.1, we use the Pix2PixHD model for the VNIR image generation. The dataset containing 456 images of stored apples (see Section 3.3.2) was used as the input for trained weights of the Pix2PixHD model to generate VNIR images. The examples of synthesized VNIR images from corresponding input RGB images are presented in Figure 15. Comparing the quality of new images with the images that were synthesizing during Pix2PixHD training stage (see Section 4.1), PSNR and SSIM values increased from 46.859 to 52.876 and from 0.972 to 0.994, respectively.

Mask R-CNN is used as the CNN-based model for the images instance segmentation. However, before applying Mask R-CNN to images, synthesized with Pix2PixHD, it was necessary to train Mask R-CNN on real VNIR images to detect and segment the fungi and decayed areas in stored apples. We used the labeled dataset containing 1029 VNIR images (see Section 3.3.2) for Mask R-CNN model training and validation. We report on the object classes used for data labeling in Section 3.4.

In this work, we implemented Mask R-CNN with the L1 as a loss function, ResNet50 as the backbone, Stochastic gradient descent (SGD) as an optimizer, and COCO weights to use Detectron2 library [102]. GaussianNoise, RandomGamma, RandomBrightness, and HorizontalFlip were applied as the data augmentation function to keep the efficiency of the proposed model during the training and validation stages. The model was developed in Python, and all calculations were realized in Google Colab.

In our experiment, we apply the cross-validation for Mask R-CNN model training on the dataset containing VNIR images. Cross-validation is a widespread technique helping avoid the overfitting during the model training on big data. In our case, we deal with the sequential images, i.e., one apple can be located in many images without any changes in position, which may resulted in improving the loss value after decreasing during the training procedure. During cross-validation, the data is usually split into several groups, called folds, where each group is used for the training and validation one by one. For example, if the dataset is separated into three folds, the pipeline is the following: (i) the first fold is a validation set, the second and third folds form the train set; (ii) the first and the third folds are train set, the second fold is a validation set; and (iii) the first and the second folds are training set, the third fold is a validation set. This pipeline is also fair for the cross-validation with four and higher folds distribution. By default, the number of folds, which is also called k-folds, is usually set equal to five or ten, but the k-folds may be different. In this work, we set the number of folds equal to two, three, six, and nine. We show the mean Average Precision values for each k-fold during Mask R-CNN models in Table 2.

The results for each object class segmentation (or per-category segmentation) during Mask R-CNN model during all folds are given in Table 3 and Table 4). We also used mAP and F1-score metrics to evaluate the segmentation quality during model training for folds distribution. Table 3 and Table 4 present the mean mAP and F1-score values for each fold, respectively. As can be seen, the number of folds leads to increasing of the metrics values and segmentation accuracy. This is a demonstration of a cross-validation technique in comparison to ordinary data splitting on the training and validation sets. Figure 16 shows the examples of VNIR images with predicted annotations of object classes (see Section 3.4) acquired during the Mask R-CNN model validation. Here we show the examples of synthesized and annotated images from k-folds = 9, as the distribution with the better mAP and F1-score values (see the column for k-folds = 9 with black bold in the Table 3 and Table 4). Even though the postharvest decay zones (Decay object class in Table 3 and Table 4) and the fungal areas (Fungi object class in Table 3 and Table 4) are detected with small values of an F1-score metric (58.861 and 40.968, respectively), a trained Mask R-CNN model allows for the detection and segmentation of spoiled apples (Spoiled apple object class), containing either decayed zones or fungal areas, or both, with an F1-score of 94.800, which is promising.

Taking into account the results of Mask R-CNN evaluation on real VNIR imaging data and the results of the Pix2PixHD evaluation in comparison to other GAN-based models (see Section 4.1), we provide the proposed pipeline for segmentation of generated VNIR images. To estimate it we acquired the dataset containing only 456 sequential RGB images without the corresponded VNIR images (see Section 3.4). The images were acquired in the greenhouse (see Section 3.3.2) under the same environmental conditions (temperature range is from 35 °C to 40 °C, and RH is 70%, respectively). In order to simulate possible occasion during the real storage, spoiled apples with the decayed and fungi zones were added to healthy (non-damaged) apples. The concept is as follows: (i) we utilize a set of RGB images as input data; (ii) these RGB images are passed through a GAN-based model (specifically, Pix2PixHD with pre-trained weights in our case); (iii) VNIR images are generated from the input RGB images using Pix2PixHD; and (iv) the generated VNIR images are fed into a CNN-based model (specifically, Mask R-CNN with pre-trained weights) to obtain these images with predicted annotation masks. Figure 17 shows the examples of images which were synthesized and segmented with the proposed pipeline. As it can be seen in Figure 17b,c, the proposed approach helps detect and segment the decayed zones separately from the fungi zones in the stored apples. All computations were also provided in Google Colab.

4.3. Early Postharvest Decay Detection in Stored Apples Using Generated VNIR Imaging Data on an Embedded System

To evaluate the applicability of a GAN- and CNN-based models in real-life scenarios we conduct an experiment using the NVIDIA Jetson Nano embedded system [103]. The goal of the experiment is to validate the model’s ability to handle video streams with varying frames per second (FPS).

We used 100 RGB images. Input RGB images are characterized by the size of 256 pixels. A GAN model was used to generate VNIR images from input images and processed over 100 images at an average rate of 17 FPS. The generated images were then tested with Mask R-CNN, resulting in an average rate of 0.420 FPS. Low FPS in Mask R-CNN can be attributed to its complexity compared to Pix2PixHD. As the two-stage detection model that performs instance segmentation by detecting objects and generating pixel-level masks for each object, it requires more computational resources. Figure 18 shows the examples of VNIR images generated and segmented using the NVIDIA Jetson Nano based on the input RGB data.

4.4. Discussion

In this section, we compare our results with other relevant research works in the field of application of NIR imaging data and deep learning techniques for early postharvest decay and fungal zones prediction in stored apples. The proposed approach is based on the joint application of GAN and CNN techniques for artificial generation and subsequent segmentation of VNIR images. However, in order to segment the decayed and fungal zones in artificially generated VNIR images, we had to train and validate a CNN technique on the real VNIR images containing these zones in stored apples. To perform this, we acquired the dataset of VNIR images (see Section 3.3) and then trained and validated the Mask R-CNN model (see Section 4.2).

Taking into the account the ability of Mask R-CNN to provide the multi-class instance and semantic segmentation (see Section 3.1.4), we trained the model not only to detect and identify the quality of apple (Healthy apple or Spoiled apple, see Section 3.4), but also to detect and predict the decayed and fungal zones separately from each other. Novelty is that the model is trained and validated to identify the quality of stored apples by taking into account the presence of decayed and fungal areas in the apples themselves. In this context, an apple is classified as Spoiled apple if it contains the decayed or fungal zones, whether they are separate or combined. Conversely, if an apple does not exhibit any decayed or fungal zones prior to storage stage, i.e., during the VNIR image collection, it is classified as a Healthy apple. However, if the decayed and/or fungal zones emerge in the apple during the storage stage, its classification transitions from a Healthy apple to Spoiled apple.

Relevant works in this area can be classified into three main groups according to main tasks: (i) defective apples detection based on the internal quality parameters [104,105]; (ii) early defect detection in apples [104,106]; and (iii) early fungi detection in apples [73,107,108]. Table 5 presents a comparative study of these works.

The authors applied various tools and methods based on machine learning for detecting the defected and diseased zones in wide NIR ranges (400–2350 nm, globally) with detailed spectral information on the diseased zones. The most relevant and similar approach to the current research is reported in [104], where a YOLO v4 model in sorting machine for real-time detection of defects in “Red Fuji”, “Golden Delicious”, and “Granny Smith” apples is implemented. The authors used the RGB and corresponded NIR images in the range of 850 nm of the apples in the machine’s sorting line. Moreover, the ability of trained YOLO v4 models to detect with bounding box ‘calyx’ and ‘stem’ zones separately from ‘defect’ zones was demonstrated. In this work, we applied the Mask R-CNN not only to detect (with bounding box) and segment (with mask) the decayed and the fungal areas in stored apples, but also to identify the quality of apples as diseased (Spoiled apple) if such zones are detected by the model. F1-score and mAP values for Decay and Fungi zones are not that high. These problems can be fixed in our future work by obtaining more VNIR images containing the fungal and the decayed areas in order to increase the data representation during the model validation. On the other hand, the results for Spoiled apple (apple contains Fungi and/or Decay zones) segmentation are 98.350 and 98.375, respectively, which is promising. Finally, the proposed approach is for an apple quality control during the storage stage, i.e., before sending the stored apples to the fruit sorting machine. The system, which could generate VNIR images without a multispectral or hyperspectral camera based only on the input RGB images with segmented fungal and decayed zones, if they occur in stored apples, can be applied as an additional stage for the fruit and vegetable control before sending them to a sorting machine.

In [106], the authors compared several Faster R-CNN, YOLO v3-Tiny, and YOLO 5s models for early decay (or bruise) detection in apples. The approach proposed in this work showed promising results in terms of the mAP metric (98.350 for Mask R-CNN validation, in our case, against 96.900 for Faster R-CNN, 99.100 for YOLO v3-Tiny, and 96.600 for YOLO 5s), and the selected model was trained to segment the decayed and fungal zones in apples, while authors in [106] trained the models to identify and predict the apples without (‘No bruise’), with a small (‘Mild bruise’) and significant (‘Severe bruise’) decayed areas in apples. The authors also acquired the NIR images in spectral range of 900–2350 nm, while in this work the images from 838 nm range were used in order to make sure that the diseased zones in VNIR images are visible in the RGB images as well.

In [105], the authors trained and validated U-Net and the improved U-Net model for the defect segmentation in VNIR images of apples. In this work, we have demonstrated the semantic segmentation of decayed and fungal areas with an advanced experimental methodology. We simulated ordinary and extreme storage conditions during the paired RGB and VNIR images collection procedures. Taking this into account, we achieved a relevant value for the diseased apples segmentation in terms of the F1-score metric.

We have demonstrated the potential for the postharvest decay and fungi prediction for stored apples. However, it can be scaled to other crops that are widely used in food production, e.g., carrots, tomatoes, cucumber, fruits or bananas. For example, the system that allows the generation and segmentation of VNIR images can be applied for segmentation and prediction of such fungi as Sclerotinia sclerotiorum or Botrytis cinerea. ’Sclerotinia’ and ’Botrytis’ fungal zones have similar morphology and, if they occur in plants, it is a nontrivial task to identify one fungi variety from another one using only RGB imagery or visual estimation of the internal fungal traits with human eyes [109]. The system supplied with the trained and validated DL technique based on the GAN and CNN models can assist the user with the additional spectral information about each fungi acquired from the generated VNIR images. It is useful for more precise antifungal activities during the food quality control.

Another potential scenario is the application of the proposed research for the preharvest diseases and the defect detection for the plants both growing in natural environments and in artificially controlled systems. For example, it can be a robot moving platform or unmanned aerial vehicle without a hyperspectral camera, but with an embedded system that may generate and segment the NIR imaging data from the input RGB one. However, DL technique should be trained, tested and validated precisely, as the proposed system has to detect and segment not only the diseased plants from the healthy ones, but also to detect the kind of defect (damage, decay, fungi variety) with the following suggestion of spoiled fruit processing.

5. Conclusions

NIR imagery provides detailed information about the diseased areas in stored fruits, which is why the hyperspectral cameras containing thousands of bands are used for food quality monitoring at postharvest stages. However, hyperspectral devices are expensive and are not friendly for the farmers and sellers’ usage. In this article, we have presented the approach based on the GAN and CNN DL techniques for early postharvest decay zones and fungi areas detection and prediction in stored apples using synthesized and segmented VNIR images.

The conclusions of this work are as follows:

The analysis of Pix2Pix, CycleGAN, and Pix2PixHD models, which are widely used GAN techniques, and their application to a dataset containing paired 1305 sequential RGB images and 1305 sequential VNIR images of stored apples of different varieties and various pre-treatments. The images were acquired under the full and partial illumination with the goal to simulate real storage conditions.
Comparison of the real VNIR images with the VNIR images synthesized by selected GAN based models. The VNIR images generated via Pix2PixHD a 0.972 score for the SSIM metric.
The training and test of Mask R-CNN on another dataset containing only 1029 sequential VNIR images of apples under violated storage conditions. Within this test, an F1-score of 58.861 is achieved for the postharvest decay zones and F1-score 40.968 for the fungal zones detection. The spoiled apples with the decayed and fungal zones are detected and segmented with F1-score 94.800.
Testing of the proposed solution on an embedded system with AI capabilities. We used 100 RGB images of stored apples as an input data for NVIDIA Jetson Nano, and the time processing of VNIR images generation by Pix2PixHD showed 17 FPS. The detection and segmentation by Mask R-CNN achieved 0.42 FPS.

The proposed approach is a promising solution able to substitute expensive hyperspectral imaging devices for early postharvest decay prediction tasks in postharvest food quality control.

Author Contributions

Conceptualization, N.S., D.S. and A.S.; methodology, N.S.; software, N.S. ans I.S.; validation, N.S., I.S. and M.S.; formal analysis, N.S. and D.S.; investigation, N.S.; resources, N.S.; data curation, N.S.; writing—original draft preparation, N.S., I.S.; writing—review and editing, N.S., I.S., M.S., D.S. and A.S.; visualization, N.S. and I.S.; supervision, A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NIR	Near Infrared Image
VNIR	Visible Near Infrared Image
AI	Artificial Intelligence
CV	Computer Vision
ML	Machine Learning
SVM	Support Vector Machines
RF	Random Forest
kNN	K-Nearest Neighbors Algorithm
GTB	Gradient Tree Boosting
DL	Deep Learning
CNN	Convolutional Neural Network
GAN	Generative Adversarial Network
ROI	Regions of Interests
SBC	Single Board Computer
RH	Relative Humidity

References

United Nations Data about Current World Population. Available online: https://www.worldometers.info/world-population/ (accessed on 26 June 2023).
United Nations Data on Current and Prospected World Population. Available online: https://population.un.org/wpp/Graphs/Probabilistic/POP/TOT/900 (accessed on 26 June 2023).
Ullah, S.; Hashmi, M.; Lee, J.; Youk, J.H.; Kim, I.S. Recent Advances in Pre-harvest, Post-harvest, Intelligent, Smart, Active, and Multifunctional Food Packaging. Fibers Polym. 2022, 23, 2063–2074. [Google Scholar] [CrossRef]
Coradi, P.C.; Maldaner, V.; Lutz, É.; da Silva Daí, P.V.; Teodoro, P.E. Influences of drying temperature and storage conditions for preserving the quality of maize postharvest on laboratory and field scales. Sci. Rep. 2020, 10, 22006. [Google Scholar] [CrossRef] [PubMed]
Mohammed, M.; Alqahtani, N.; El-Shafie, H. Development and evaluation of an ultrasonic humidifier to control humidity in a cold storage room for postharvest quality management of dates. Foods 2021, 10, 949. [Google Scholar] [CrossRef] [PubMed]
Sun, X.; Baldwin, E.; Bai, J. Applications of gaseous chlorine dioxide on postharvest handling and storage of fruits and vegetables—A review. Food Control 2019, 95, 18–26. [Google Scholar] [CrossRef]
Yahia, E.M.; Fonseca, J.M.; Kitinoja, L. Postharvest losses and waste. In Postharvest Technology of Perishable Horticultural Commodities; Elsevier: Amsterdam, The Netherlands, 2019; pp. 43–69. [Google Scholar]
Palumbo, M.; Attolico, G.; Capozzi, V.; Cozzolino, R.; Corvino, A.; de Chiara, M.L.V.; Pace, B.; Pelosi, S.; Ricci, I.; Romaniello, R.; et al. Emerging Postharvest Technologies to Enhance the Shelf-Life of Fruit and Vegetables: An Overview. Foods 2022, 11, 3925. [Google Scholar] [CrossRef]
Elik, A.; Yanik, D.K.; Istanbullu, Y.; Guzelsoy, N.A.; Yavuz, A.; Gogus, F. Strategies to reduce post-harvest losses for fruits and vegetables. Strategies 2019, 5, 29–39. [Google Scholar]
FAO Data on Global Apple Production. Available online: https://www.fao.org/faostat/en/#data/QCL/visualize (accessed on 26 June 2023).
Harker, F.; Feng, J.; Johnston, J.; Gamble, J.; Alavi, M.; Hall, M.; Chheang, S. Influence of postharvest water loss on apple quality: The use of a sensory panel to verify destructive and non-destructive instrumental measurements of texture. Postharvest Biol. Technol. 2019, 148, 32–37. [Google Scholar] [CrossRef]
de Andrade, J.C.; Galvan, D.; Effting, L.; Tessaro, L.; Aquino, A.; Conte-Junior, C.A. Multiclass Pesticide Residues in Fruits and Vegetables from Brazil: A Systematic Review of Sample Preparation Until Post-Harvest. Crit. Rev. Anal. Chem. 2021, 1–23. Available online: https://www.tandfonline.com/doi/abs/10.1080/10408347.2021.2013157 (accessed on 26 June 2023).
Bratu, A.M.; Petrus, M.; Popa, C. Monitoring of post-harvest maturation processes inside stored fruit using photoacoustic gas sensing spectroscopy. Materials 2020, 13, 2694. [Google Scholar] [CrossRef]
Sottocornola, G.; Baric, S.; Nocker, M.; Stella, F.; Zanker, M. Picture-based and conversational decision support to diagnose post-harvest apple diseases. Expert Syst. Appl. 2022, 189, 116052. [Google Scholar] [CrossRef]
Malvandi, A.; Feng, H.; Kamruzzaman, M. Application of NIR spectroscopy and multivariate analysis for Non-destructive evaluation of apple moisture content during ultrasonic drying. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2022, 269, 120733. [Google Scholar] [CrossRef]
Schlie, T.P.; Dierend, W.; Koepcke, D.; Rath, T. Detecting low-oxygen stress of stored apples using chlorophyll fluorescence imaging and histogram division. Postharvest Biol. Technol. 2022, 189, 111901. [Google Scholar] [CrossRef]
Wang, L.; Huang, J.; Li, Z.; Liu, D.; Fan, J. A review of the polyphenols extraction from apple pomace: Novel technologies and techniques of cell disintegration. Crit. Rev. Food Sci. Nutr. 2022, 1–14. [Google Scholar] [CrossRef] [PubMed]
Wu, X.; Fauconnier, M.L.; Bi, J. Characterization and Discrimination of Apples by Flash GC E-Nose: Geographical Regions and Botanical Origins Studies in China. Foods 2022, 11, 1631. [Google Scholar] [CrossRef] [PubMed]
Biasi, A.; Zhimo, V.Y.; Kumar, A.; Abdelfattah, A.; Salim, S.; Feygenberg, O.; Wisniewski, M.; Droby, S. Changes in the fungal community assembly of apple fruit following postharvest application of the yeast biocontrol agent Metschnikowia fructicola. Horticulturae 2021, 7, 360. [Google Scholar] [CrossRef]
Bartholomew, H.P.; Lichtner, F.J.; Bradshaw, M.; Gaskins, V.L.; Fonseca, J.M.; Bennett, J.W.; Jurick, W.M. Comparative Penicillium spp. Transcriptomics: Conserved Pathways and Processes Revealed in Ungerminated Conidia and during Postharvest Apple Fruit Decay. Microorganisms 2022, 10, 2414. [Google Scholar] [CrossRef]
Morales-Cedeno, L.R.; del Carmen Orozco-Mosqueda, M.; Loeza-Lara, P.D.; Parra-Cota, F.I.; de Los Santos-Villalobos, S.; Santoyo, G. Plant growth-promoting bacterial endophytes as biocontrol agents of pre-and post-harvest diseases: Fundamentals, methods of application and future perspectives. Microbiol. Res. 2021, 242, 126612. [Google Scholar] [CrossRef]
Nikparvar, B.; Thill, J.C. Machine learning of spatial data. ISPRS Int. J. Geo-Inf. 2021, 10, 600. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, M.; Yu, F.; Zeng, T.; Wang, Y. An o-shape neural network with attention modules to detect junctions in biomedical images without segmentation. IEEE J. Biomed. Health Inform. 2021, 26, 774–785. [Google Scholar] [CrossRef]
Zhao, S.; Blaabjerg, F.; Wang, H. An overview of artificial intelligence applications for power electronics. IEEE Trans. Power Electron. 2020, 36, 4633–4658. [Google Scholar] [CrossRef]
Meshram, V.; Patil, K.; Meshram, V.; Hanchate, D.; Ramkteke, S. Machine learning in agriculture domain: A state-of-art survey. Artif. Intell. Life Sci. 2021, 1, 100010. [Google Scholar] [CrossRef]
Kakani, V.; Nguyen, V.H.; Kumar, B.P.; Kim, H.; Pasupuleti, V.R. A critical review on computer vision and artificial intelligence in food industry. J. Agric. Food Res. 2020, 2, 100033. [Google Scholar] [CrossRef]
Rasti, S.; Bleakley, C.J.; Holden, N.; Whetton, R.; Langton, D.; O’Hare, G. A survey of high resolution image processing techniques for cereal crop growth monitoring. Inf. Process. Agric. 2022, 9, 300–315. [Google Scholar] [CrossRef]
Tang, Y.; Qiu, J.; Zhang, Y.; Wu, D.; Cao, Y.; Zhao, K.; Zhu, L. Optimization strategies of fruit detection to overcome the challenge of unstructured background in field orchard environment: A review. Precis. Agric. 2023, 24, 1183–1219. [Google Scholar] [CrossRef]
Ouhami, M.; Hafiane, A.; Es-Saady, Y.; El Hajji, M.; Canals, R. Computer vision, IoT and data fusion for crop disease detection using machine learning: A survey and ongoing research. Remote Sens. 2021, 13, 2486. [Google Scholar] [CrossRef]
Wu, Z.; Chen, Y.; Zhao, B.; Kang, X.; Ding, Y. Review of weed detection methods based on computer vision. Sensors 2021, 21, 3647. [Google Scholar] [CrossRef] [PubMed]
Mendigoria, C.H.; Aquino, H.; Concepcion, R.; Alajas, O.J.; Dadios, E.; Sybingco, E. Vision-based postharvest analysis of musa acuminata using feature-based machine learning and deep transfer networks. In Proceedings of the 2021 IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC), Bangalore, India, 30 September–2 October 2021; pp. 1–6. [Google Scholar]
Bucio, F.; Isaza, C.; Gonzalez, E.; De Paz, J.Z.; Sierra, J.R.; Rivera, E.A. Non-Destructive Post-Harvest Tomato Mass Estimation Model Based on Its Area via Computer Vision and Error Minimization Approaches. IEEE Access 2022, 10, 100247–100256. [Google Scholar] [CrossRef]
Ropelewska, E. Postharvest Authentication of Potato Cultivars Using Machine Learning to Provide High-Quality Products. Chem. Proc. 2022, 10, 30. [Google Scholar]
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. arXiv 2018, arXiv:1611.07004. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. arXiv 2020, arXiv:1703.10593. [Google Scholar]
Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. arXiv 2018, arXiv:1711.11585. [Google Scholar]
Christovam, L.E.; Shimabukuro, M.H.; Galo, M.d.L.B.; Honkavaara, E. Pix2pix conditional generative adversarial network with MLP loss function for cloud removal in a cropland time series. Remote Sens. 2022, 14, 144. [Google Scholar] [CrossRef]
de Lima, D.C.; Saqui, D.; Mpinda, S.A.T.; Saito, J.H. Pix2pix network to estimate agricultural near infrared images from rgb data. Can. J. Remote Sens. 2022, 48, 299–315. [Google Scholar] [CrossRef]
Farooque, A.A.; Afzaal, H.; Benlamri, R.; Al-Naemi, S.; MacDonald, E.; Abbas, F.; MacLeod, K.; Ali, H. Red-green-blue to normalized difference vegetation index translation: A robust and inexpensive approach for vegetation monitoring using machine vision and generative adversarial networks. Precis. Agric. 2023, 24, 1097–1115. [Google Scholar] [CrossRef]
Bertoglio, R.; Mazzucchelli, A.; Catalano, N.; Matteucci, M. A comparative study of Fourier transform and CycleGAN as domain adaptation techniques for weed segmentation. Smart Agric. Technol. 2023, 4, 100188. [Google Scholar] [CrossRef]
Jung, D.H.; Kim, C.Y.; Lee, T.S.; Park, S.H. Depth image conversion model based on CycleGAN for growing tomato truss identification. Plant Methods 2022, 18, 83. [Google Scholar] [CrossRef] [PubMed]
van Marrewijk, B.M.; Polder, G.; Kootstra, G. Investigation of the added value of CycleGAN on the plant pathology dataset. IFAC-PapersOnLine 2022, 55, 89–94. [Google Scholar] [CrossRef]
Yang, J.; Zhang, T.; Fang, C.; Zheng, H. A defencing algorithm based on deep learning improves the detection accuracy of caged chickens. Comput. Electron. Agric. 2023, 204, 107501. [Google Scholar] [CrossRef]
Tsuchikawa, S.; Ma, T.; Inagaki, T. Application of near-infrared spectroscopy to agriculture and forestry. Anal. Sci. 2022, 38, 635–642. [Google Scholar] [CrossRef]
Stasenko, N.; Savinov, M.; Burlutskiy, V.; Pukalchik, M.; Somov, A. Deep Learning for Postharvest Decay Prediction in Apples. In Proceedings of the IECON 2021—47th Annual Conference of the IEEE Industrial Electronics Society, Toronto, ON, Canada, 13–16 October 2021; pp. 1–6. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Yurtkulu, S.C.; Şahin, Y.H.; Unal, G. Semantic Segmentation with Extended DeepLabv3 Architecture. In Proceedings of the 2019 27th Signal Processing and Communications Applications Conference (SIU), Sivas, Turkey, 24–26 April 2019; pp. 1–4. [Google Scholar]
Assunção, E.; Gaspar, P.D.; Mesquita, R.; Simões, M.P.; Alibabaei, K.; Veiros, A.; Proença, H. Real-Time Weed Control Application Using a Jetson Nano Edge Device and a Spray Mechanism. Remote Sens. 2022, 14, 4217. [Google Scholar] [CrossRef]
Saddik, A.; Latif, R.; Taher, F.; El Ouardi, A.; Elhoseny, M. Mapping Agricultural Soil in Greenhouse Using an Autonomous Low-Cost Robot and Precise Monitoring. Sustainability 2022, 14, 15539. [Google Scholar] [CrossRef]
de Aguiar, A.S.P.; dos Santos, F.B.N.; dos Santos, L.C.F.; de Jesus Filipe, V.M.; de Sousa, A.J.M. Vineyard trunk detection using deep learning–An experimental device benchmark. Comput. Electron. Agric. 2020, 175, 105535. [Google Scholar] [CrossRef]
Mazzia, V.; Khaliq, A.; Salvetti, F.; Chiaberge, M. Real-time apple detection system using embedded systems with hardware accelerators: An edge AI application. IEEE Access 2020, 8, 9102–9114. [Google Scholar] [CrossRef]
Beegam, K.S.; Shenoy, M.V.; Chaturvedi, N. Hybrid consensus and recovery block-based detection of ripe coffee cherry bunches using RGB-D sensor. IEEE Sens. J. 2021, 22, 732–740. [Google Scholar] [CrossRef]
Zhang, W.; Liu, Y.; Chen, K.; Li, H.; Duan, Y.; Wu, W.; Shi, Y.; Guo, W. Lightweight fruit-detection algorithm for edge computing applications. Front. Plant Sci. 2021, 12, 740936. [Google Scholar] [CrossRef]
Vilcamiza, G.; Trelles, N.; Vinces, L.; Oliden, J. A coffee bean classifier system by roast quality using convolutional neural networks and computer vision implemented in an NVIDIA Jetson Nano. In Proceedings of the 2022 Congreso Internacional de Innovación y Tendencias en Ingeniería (CONIITI), Bogota, Colombia, 5–7 October 2022; pp. 1–6. [Google Scholar]
Fan, K.J.; Su, W.H. Applications of Fluorescence Spectroscopy, RGB-and MultiSpectral Imaging for Quality Determinations of White Meat: A Review. Biosensors 2022, 12, 76. [Google Scholar] [CrossRef]
Zou, X.; Zhang, Y.; Lin, R.; Gong, G.; Wang, S.; Zhu, S.; Wang, Z. Pixel-level Bayer-type colour router based on metasurfaces. Nat. Commun. 2022, 13, 3288. [Google Scholar] [CrossRef]
Rivero Mesa, A.; Chiang, J. Non-invasive grading system for banana tiers using RGB imaging and deep learning. In Proceedings of the 2021 7th International Conference on Computing and Artificial Intelligence, Tianjin, China, 23–26 April 2021; pp. 113–118. [Google Scholar]
Nasiri, A.; Taheri-Garavand, A.; Zhang, Y.D. Image-based deep learning automated sorting of date fruit. Postharvest Biol. Technol. 2019, 153, 133–141. [Google Scholar] [CrossRef]
Deng, L.; Li, J.; Han, Z. Online defect detection and automatic grading of carrots using computer vision combined with deep learning methods. LWT 2021, 149, 111832. [Google Scholar] [CrossRef]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. arXiv 2017, arXiv:1707.01083. [Google Scholar]
Wu, F.; Yang, Z.; Mo, X.; Wu, Z.; Tang, W.; Duan, J.; Zou, X. Detection and counting of banana bunches by integrating deep learning and classic image-processing algorithms. Comput. Electron. Agric. 2023, 209, 107827. [Google Scholar] [CrossRef]
Baheti, B.; Innani, S.; Gajre, S.; Talbar, S. Semantic scene segmentation in unstructured environment with modified DeepLabV3+. Pattern Recognit. Lett. 2020, 138, 223–229. [Google Scholar] [CrossRef]
Wu, F.; Duan, J.; Ai, P.; Chen, Z.; Yang, Z.; Zou, X. Rachis detection and three-dimensional localization of cut off point for vision-based banana robot. Comput. Electron. Agric. 2022, 198, 107079. [Google Scholar] [CrossRef]
Buyukarikan, B.; Ulker, E. Classification of physiological disorders in apples fruit using a hybrid model based on convolutional neural network and machine learning methods. Neural Comput. Appl. 2022, 34, 16973–16988. [Google Scholar] [CrossRef]
Li, J.; Zheng, K.; Yao, J.; Gao, L.; Hong, D. Deep unsupervised blind hyperspectral and multispectral data fusion. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Liang, J.; Li, X.; Zhu, P.; Xu, N.; He, Y. Hyperspectral reflectance imaging combined with multivariate analysis for diagnosis of Sclerotinia stem rot on Arabidopsis thaliana leaves. Appl. Sci. 2019, 9, 2092. [Google Scholar] [CrossRef] [Green Version]
Vashpanov, Y.; Heo, G.; Kim, Y.; Venkel, T.; Son, J.Y. Detecting green mold pathogens on lemons using hyperspectral images. Appl. Sci. 2020, 10, 1209. [Google Scholar] [CrossRef] [Green Version]
Fahrentrapp, J.; Ria, F.; Geilhausen, M.; Panassiti, B. Detection of gray mold leaf infections prior to visual symptom appearance using a five-band multispectral sensor. Front. Plant Sci. 2019, 10, 628. [Google Scholar] [CrossRef] [Green Version]
Wan, L.; Li, H.; Li, C.; Wang, A.; Yang, Y.; Wang, P. Hyperspectral Sensing of Plant Diseases: Principle and Methods. Agronomy 2022, 12, 1451. [Google Scholar] [CrossRef]
Błaszczyk, U.; Wyrzykowska, S.; Gąstoł, M. Application of Bioactive Coatings with Killer Yeasts to Control Post-Harvest Apple Decay Caused by Botrytis cinerea and Penicillium italicum. Foods 2022, 11, 1868. [Google Scholar] [CrossRef]
Amaral Carneiro, G.; Walcher, M.; Baric, S. Cadophora luteo-olivacea isolated from apple (Malus domestica) fruit with post-harvest side rot symptoms in northern Italy. Eur. J. Plant Pathol. 2022, 162, 247–255. [Google Scholar] [CrossRef]
Ghooshkhaneh, N.G.; Golzarian, M.R.; Mollazade, K. VIS-NIR spectroscopy for detection of citrus core rot caused by Alternaria alternata. Food Control 2023, 144, 109320. [Google Scholar] [CrossRef]
Ekramirad, N.; Khaled, A.Y.; Doyle, L.E.; Loeb, J.R.; Donohue, K.D.; Villanueva, R.T.; Adedeji, A.A. Nondestructive detection of codling moth infestation in apples using pixel-based nir hyperspectral imaging with machine learning and feature selection. Foods 2022, 11, 8. [Google Scholar] [CrossRef]
Jiang, B.; He, J.; Yang, S.; Fu, H.; Li, T.; Song, H.; He, D. Fusion of machine vision technology and AlexNet-CNNs deep learning network for the detection of postharvest apple pesticide residues. Artif. Intell. Agric. 2019, 1, 1–8. [Google Scholar] [CrossRef]
Huang, C.; Li, X.; Wen, Y. AN OTSU image segmentation based on fruitfly optimization algorithm. Alex. Eng. J. 2021, 60, 183–188. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
Zhang, D.; Zhou, X.; Zhang, J.; Lan, Y.; Xu, C.; Liang, D. Detection of rice sheath blight using an unmanned aerial system with high-resolution color and multispectral imaging. PLoS ONE 2018, 13, e0187470. [Google Scholar] [CrossRef] [Green Version]
Sun, Y.; Xiao, H.; Tu, S.; Sun, K.; Pan, L.; Tu, K. Detecting decayed peach using a rotating hyperspectral imaging testbed. LWT 2018, 87, 326–332. [Google Scholar] [CrossRef]
Li, J.; Luo, W.; Wang, Z.; Fan, S. Early detection of decay on apples using hyperspectral reflectance imaging combining both principal component analysis and improved watershed segmentation method. Postharvest Biol. Technol. 2019, 149, 235–246. [Google Scholar] [CrossRef]
Hyperspectral Imaging Systems Market Size Report. Available online: https://www.grandviewresearch.com/industry-analysis/hyperspectral-imaging-systems-market (accessed on 26 June 2023).
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Illarionova, S.; Shadrin, D.; Trekin, A.; Ignatiev, V.; Oseledets, I. Generation of the nir spectral band for satellite images with convolutional neural networks. Sensors 2021, 21, 5646. [Google Scholar] [CrossRef]
Lu, Y.; Chen, D.; Olaniyi, E.; Huang, Y. Generative adversarial networks (GANs) for image augmentation in agriculture: A systematic review. Comput. Electron. Agric. 2022, 200, 107208. [Google Scholar] [CrossRef]
Khatri, K.; Asha, C.; D’Souza, J.M. Detection of Animals in Thermal Imagery for Surveillance using GAN and Object Detection Framework. In Proceedings of the 2022 International Conference for Advancement in Technology (ICONAT), Goa, India, 21–22 January 2022; pp. 1–6. [Google Scholar]
Valerio Giuffrida, M.; Scharr, H.; Tsaftaris, S.A. Arigan: Synthetic arabidopsis plants using generative adversarial network. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 2064–2071. [Google Scholar]
Tang, H.; Xu, D.; Yan, Y.; Corso, J.J.; Torr, P.H.; Sebe, N. Multi-channel attention selection gans for guided image-to-image translation. arXiv 2020, arXiv:2002.01048. [Google Scholar] [CrossRef]
Guo, Z.; Shao, M.; Li, S. Image-to-image translation using an offset-based multi-scale codes GAN encoder. Vis. Comput. 2023, 1–17. [Google Scholar] [CrossRef]
Fard, A.S.; Reutens, D.C.; Vegh, V. From CNNs to GANs for cross-modality medical image estimation. Comput. Biol. Med. 2022, 146, 105556. [Google Scholar] [CrossRef] [PubMed]
Saharia, C.; Chan, W.; Chang, H.; Lee, C.; Ho, J.; Salimans, T.; Fleet, D.; Norouzi, M. Palette: Image-to-image diffusion models. In Proceedings of the ACM SIGGRAPH 2022 Conference Proceedings, Vancouver, BC, Canada, 7–11 August 2022; pp. 1–10. [Google Scholar]
Kshatriya, B.S.; Dubey, S.R.; Sarma, H.; Chaudhary, K.; Gurjar, M.R.; Rai, R.; Manchanda, S. Semantic Map Injected GAN Training for Image-to-Image Translation. In Proceedings of the Satellite Workshops of ICVGIP 2021, Gandhinagar, India, 8–10 December 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 235–249. [Google Scholar]
Sa, I.; Lim, J.Y.; Ahn, H.S.; MacDonald, B. deepNIR: Datasets for generating synthetic NIR images and improved fruit detection system using deep learning techniques. Sensors 2022, 22, 4721. [Google Scholar] [CrossRef] [PubMed]
Li, C.; Wand, M. Precomputed real-time texture synthesis with markovian generative adversarial networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 702–716. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Saletnik, B.; Zaguła, G.; Saletnik, A.; Bajcar, M.; Słysz, E.; Puchalski, C. Method for Prolonging the Shelf Life of Apples after Storage. Appl. Sci. 2022, 12, 3975. [Google Scholar] [CrossRef]
Nesteruk, S.; Illarionova, S.; Akhtyamov, T.; Shadrin, D.; Somov, A.; Pukalchik, M.; Oseledets, I. XtremeAugment: Getting More From Your Data Through Combination of Image Collection and Image Augmentation. IEEE Access 2022, 10, 24010–24028. [Google Scholar] [CrossRef]
Martínez-Zamora, L.; Castillejo, N.; Artés-Hernández, F. Postharvest UV-B and photoperiod with blue+ red LEDs as strategies to stimulate carotenogenesis in bell peppers. Appl. Sci. 2021, 11, 3736. [Google Scholar] [CrossRef]
Supervisely Data Annotator. Available online: https://app.supervise.ly (accessed on 26 June 2023).
Wu, Y.; Kirillov, A.; Massa, F.; Lo, W.Y.; Girshick, R. Detectron2. 2019. Available online: https://github.com/facebookresearch/detectron2 (accessed on 26 June 2023).
NVIDIA. Jetson Modules Technical Specificatons. 2023. Available online: https://developer.nvidia.com/embedded/jetson-modules (accessed on 26 June 2023).
Fan, S.; Liang, X.; Huang, W.; Zhang, V.J.; Pang, Q.; He, X.; Li, L.; Zhang, C. Real-time defects detection for apple sorting using NIR cameras with pruning-based YOLOV4 network. Comput. Electron. Agric. 2022, 193, 106715. [Google Scholar] [CrossRef]
Tang, Y.; Bai, H.; Sun, L.; Wang, Y.; Hou, J.; Huo, Y.; Min, R. Multi-Band-Image Based Detection of Apple Surface Defect Using Machine Vision and Deep Learning. Horticulturae 2022, 8, 666. [Google Scholar] [CrossRef]
Yuan, Y.; Yang, Z.; Liu, H.; Wang, H.; Li, J.; Zhao, L. Detection of early bruise in apple using near-infrared camera imaging technology combined with deep learning. Infrared Phys. Technol. 2022, 127, 104442. [Google Scholar] [CrossRef]
Zhang, Z.; Pu, Y.; Wei, Z.; Liu, H.; Zhang, D.; Zhang, B.; Zhang, Z.; Zhao, J.; Hu, J. Combination of interactance and transmittance modes of Vis/NIR spectroscopy improved the performance of PLS-DA model for moldy apple core. Infrared Phys. Technol. 2022, 126, 104366. [Google Scholar] [CrossRef]
Hu, Q.X.; Tian, J.; Fang, Y. Detection of moldy cores in apples with near-infrared transmission spectroscopy based on wavelet and BP network. Int. J. Pattern Recognit. Artif. Intell. 2019, 33, 1950020. [Google Scholar] [CrossRef]
Sadek, M.E.; Shabana, Y.M.; Sayed-Ahmed, K.; Abou Tabl, A.H. Antifungal activities of sulfur and copper nanoparticles against cucumber postharvest diseases caused by Botrytis cinerea and Sclerotinia sclerotiorum. J. Fungi 2022, 8, 412. [Google Scholar] [CrossRef]

Figure 1. Diagram summarizing the proposed approach for the application of segmented VNIR imagery data via deep learning for early postharvest decay prediction in apples.

Figure 2. Pix2Pix block diagram.

Figure 3. CycleGAN block diagram.

Figure 4. Pix2PixHD generator block diagram.

Figure 5. Mask R-CNN block diagram.

Figure 6. Apples selected for data collection.

Figure 7. Experimental testbed for paired RGB and VNIR image capturing.

Figure 8. Types of images obtained during the experiments: (A)—RGB image of apples acquired under the full illumination; (B)—VNIR image of apples acquired under the full illumination (838 nm); (C)—VNIR image of apples acquired under the partial illumination (838 nm).

Figure 9. VNIR image of apples selected for data collection.

Figure 10. Experimental greenhouse for data acquisition.

Figure 11. RGB image of apples selected for data collection.

Figure 12. The example of image annotation and objects classes in Supervisely.

Figure 13. GAN-based models evaluation: (a) CycleGAN discriminator loss values during the training; (b) Pix2Pix discriminator loss values during the training; and (c) Pix2PixHD discriminator loss values during the training.

Figure 14. Examples of VNIR generated images in comparison to original VNIR image: (a) obtained under full illumination; and (b) obtained under partial illumination.

Figure 15. Examples of synthesized VNIR images with Pix2PixHD model weights.

Figure 16. Comparison of object classes annotation in real VNIR images (a,b, on the left with ‘Annotated image’ label) to predicted object annotations (a,b, on the right with ‘Predicted annotations’ label) during Mask R-CNN model training.

Figure 17. Synthesized VNIR images (a–c) segmentation with Mask R-CNN model.

Figure 18. Generated and segmented VNIR images (a–c) using Jetson Nano.

Table 1. Image-to-image models comparison for RGB to VNIR images generation.

Models	MAE	MAPE	MSE	PSNR	SSIM
CycleGAN	0.067	0.105	0.01127	27.375	0.856
Pix2Pix	0.004	0.006	0.00003	46.433	0.955
Pix2PixHD	0.004	0.006	0.00003	46.859	0.972

Table 2. Comparison of Average Precision for Mask R-CNN model.

$k - Folds$	mAP	mAP $_{50}$	mAP $_{75}$	mAP $_{S}$	mAP $_{M}$	mAP $_{L}$
2	64.251	90.205	65.606	37.202	75.980	97.412
3	67.652	90.354	65.348	35.400	75.290	96.290
6	67.026	90.950	67.055	38.188	74.609	98.871
9	67.993	91.120	64.871	31.575	75.181	97.257

Table 3. Results on per-category segmentation by Mask R-CNN using mAP metric.

Category	mAP
Category	$k - Folds = 2$	$k - Folds = 3$	$k - Folds = 6$	$k - Folds = 9$
Healthy apple	94.785	95.154	93.951	98.350
Spoiled apple	87.839	92.567	93.678	93.997
Decay	53.509	53.408	54.620	57.562
Fungi	31.581	30.609	34.285	39.967

Table 4. Results on per-category segmentation by Mask R-CNN using F1-score metric.

Category	F1-Score
Category	$k - Folds = 2$	$k - Folds = 3$	$k - Folds = 6$	$k - Folds = 9$
Healthy apple	95.640	95.589	94.799	98.375
Spoiled apple	88.120	93.134	94.689	94.800
Decay	53.309	53.213	54.850	58.861
Fungi	31.686	37.247	35.126	40.968

Table 5. Comparative table of relevant research works.

References	Task	NIR Images Range, nm	Technique	Metric	Value
[104]	Real-time apple defect inspection	850	YOLO v4	F1	92.000
[105]	Apples surface defect segmentation	460–842	U-Net	F1-score	87.000
[105]	Apples surface defect segmentation	460–842	the improved U-Net	F1-score	91.000
[106]	Early bruise detection in apples	900–2350	Faster R-CNN	mAP	96.900
[106]	Early bruise detection in apples	900–2350	YOLO v3-Tiny	mAP	99.100
[106]	Early bruise detection in apples	900–2350	YOLO 5s	mAP	99.600
[107]	Moldy core detection in apples	400–850	CARS-PLS-DA model	Accuracy	87.880
[73]	Codling Moth detection in apples	900–1700	Gradient tree boosting	F1-score	97.000
[108]	Moldy core detection in apples	200–1100	BP-ANN	Accuracy	95.000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Stasenko, N.; Shukhratov, I.; Savinov, M.; Shadrin, D.; Somov, A. Deep Learning in Precision Agriculture: Artificially Generated VNIR Images Segmentation for Early Postharvest Decay Prediction in Apples. Entropy 2023, 25, 987. https://doi.org/10.3390/e25070987

AMA Style

Stasenko N, Shukhratov I, Savinov M, Shadrin D, Somov A. Deep Learning in Precision Agriculture: Artificially Generated VNIR Images Segmentation for Early Postharvest Decay Prediction in Apples. Entropy. 2023; 25(7):987. https://doi.org/10.3390/e25070987

Chicago/Turabian Style

Stasenko, Nikita, Islomjon Shukhratov, Maxim Savinov, Dmitrii Shadrin, and Andrey Somov. 2023. "Deep Learning in Precision Agriculture: Artificially Generated VNIR Images Segmentation for Early Postharvest Decay Prediction in Apples" Entropy 25, no. 7: 987. https://doi.org/10.3390/e25070987

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning in Precision Agriculture: Artificially Generated VNIR Images Segmentation for Early Postharvest Decay Prediction in Apples

Abstract

1. Introduction

2. Related Works

2.1. CV Approaches Based on CNN Models Using RGB Imaging Data

2.2. Machine Learning and Deep Learning Methods for NIR Data Analysis

2.3. GAN-Based Models for RGB and NIR Data Analysis

3. Materials and Methods

3.1. DL Techniques

3.1.1. Pix2Pix

3.1.2. CycleGAN

3.1.3. Pix2PixHD

3.1.4. Mask R-CNN

3.2. Performance Metrics

3.3. Experimental Testbeds and Data Acquisition

3.3.1. Experimental Testbed for Paired RGB and VNIR Imaging Data Collection

3.3.2. Experimental Testbed for VNIR Imaging Data Collection

3.4. Data Annotation

4. Results and Discussion

4.1. Image-to-Image Models Comparison for VNIR Images Generation from RGB

4.2. Segmentation of Generated VNIR Images for Early Postharvest Decay Detection in Apples

4.3. Early Postharvest Decay Detection in Stored Apples Using Generated VNIR Imaging Data on an Embedded System

4.4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI