Non-Destructive Identification and Estimation of Granulation in Honey Pomelo Using Visible and Near-Infrared Transmittance Spectroscopy Combined with Machine Vision Technology

Sun, Xiaopeng; Xu, Sai; Lu, Huazhong

doi:10.3390/app10165399

Open AccessArticle

Non-Destructive Identification and Estimation of Granulation in Honey Pomelo Using Visible and Near-Infrared Transmittance Spectroscopy Combined with Machine Vision Technology

by

Xiaopeng Sun

¹,

Sai Xu

^2,* and

Huazhong Lu

^1,3

¹

College of Engineering, South China Agricultural University, Guangzhou 510640, China

²

Public Monitoring Center for Agro-Product of Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China

³

Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2020, 10(16), 5399; https://doi.org/10.3390/app10165399

Submission received: 1 June 2020 / Revised: 31 July 2020 / Accepted: 3 August 2020 / Published: 5 August 2020

(This article belongs to the Section Optics and Lasers)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Granulation is a physiological disorder of juice sacs in citrus fruit, causing juice sacs to become hard and dry and resulting in decreased internal quality of citrus fruit. Honey pomelo is a thick-skinned citrus fruit, and it is difficult to identify the extent of granulation by observation of the outer peel and fruit shape. In this study, a rapid and non-destructive testing method using visible and near-infrared transmittance spectroscopy combined with machine vision technology was applied to identify and estimate granulation inside fruit. A total of 600 samples in different growth periods was harvested, and fruit were divided into five classes according to five granulation levels. Spectral data were obtained for two ranges of 400–1100 nm and 900–1700 nm by visible and near-infrared transmittance spectroscopy. In addition, chemometrics were used to measure the chemical changes of soluble solid content (SSC), titratable acidity (TA), and moisture content (MC) caused by different granulation levels. Machine vision technology can rapidly estimate the external characteristics of samples and measure the physical changes in mass and volume caused by different granulation levels. Compared with using a single or traditional methods, the predictive performances of multi-category classification models (PCA-SVM and PCA-GRNN) were significantly enhanced. In particular, the model accuracy rate (ARM) was 99% for PCA-GRNN, with classification accuracy (CA), classification sensitivity (CS), and classification specificity (CSP) of 0.9950, 0.9750, and 0.9934, respectively. The results showed that this method has great potential for the identification and estimation of granulation. Multi-source data fusion and application of a multi-category classification model with the smallest number of input layers and acceptable high predictive performances are proposed for on-line applications. This method can be effectively used on-line for the non-destructive detection of fruits with granulation.

Keywords:

visible and near-infrared transmittance spectroscopy; machine vision technology; granulation; honey pomelo; multi-source data fusion; multi-category classification

1. Introduction

Honey pomelo is a type of citrus fruit with excellent flavor and high nutritional value. It is commonly cultivated throughout Asia, with extensive cultivation in China [1]. Citrus fruit can suffer from various physiological disorders including granulation (also described as crystallization, scierification, or dryness). In 1934, granulation was first described by Bartholomew, Sinclair, and Raby [2], and subsequently studied by researchers from different citrus fruit-growing countries, including China [3]. Granulation is a physiological disorder, where the juice sacs in citrus fruit become hard, dry, and enlarged, become grayish in appearance, and have little extractable juice [4,5,6]. The severity of the disorder increases during storage, as the stored fruit loses additional moisture from the juice vesicles. With continued granulation, there is rapid deterioration of the organoleptic property, reducing the edible value of the fruit [7]. Granulation of citrus fruit has been found to correlate with secondary wall formation and cell wall thickening in granulated juice vesicles [5,8]. The harvest and post-harvest storage times can affect the degree of granulation, as late harvest and prolonged storage enhance granulation levels in pomelo fruit, which affects the overall internal fruit quality [9,10].

Visible and near-infrared transmittance spectroscopy has been widely applied to identify and estimate the internal quality attributes of fruit, by assessing soluble solids content (SSC) [11], maturity [12], and early decay [13]. However, there have been few studies of granulation of citrus fruit. Chemometrics is the application of mathematics, statistics, and computer science to the extraction of useful information from multivariate data. It includes exploratory data analysis, machine learning, data mining, and classification [14]. In this research, visible and near-infrared transmittance spectra data combined with chemometrics were first used to assess granulation using multi-category models (LDA, PCA-SVM, and PCA-GRNN) (Table A1 in Appendix A). The specific aims of this work were: (1) to evaluate the feasibility of multi-category models (LDA, PCA-SVM, and PCA-GRNN) to classify honey pomelo fruits into five granulation levels; (2) to compare the model accuracy rate (ARM) and predictive performance (CA, CS, and CSP) of three multi-category models to discriminate five granulation levels of samples.

To assess internal fruit quality, different methods have been applied; however, methods with limited penetration depth are limited in their ability to accurately assess internal quality. The predictive accuracy of measurement of internal quality attributes by transmittance spectroscopy is especially low for thick-skinned fruit, such as watermelon and pomelo [15]. To enhance the predictive performances of models, the goal of the research was to combine visible and near-infrared transmittance spectroscopy with machine vision technology to assess the degree of granulation in samples. In recent years, machine vision technology has been widely applied in the agricultural and food industries, including for the assessment of fruit characteristics, such as mass and volume [16,17,18,19]. Fruit volume can be used to predict harvest time [20], and differences in volume and mass are used for fruit quality testing by correlation with granulation in citrus [21]. Machine vision technology presents an effective and relatively inexpensive approach for accurate measurement of fruit characteristics, and could be applied to test large numbers of samples quickly and non-destructively.

In multi-source data fusion, character-level fusion is used to extract the principal component of preprocessed spectra data, and data-level fusion is used to combine the external characteristics with the principal components of the preprocessed spectra data. We propose multi-source data fusion and a multi-category classification model with a low number of input layers and acceptable high classification accuracy for on-line fruit assessment. The goal of this research was to develop a rapid, accurate, and non-destructive method for online grading and detection of fruits with granulation using visible and near-infrared spectroscopy combined with machine vision.

2. Materials and Methods

2.1. Experimental Samples

Honey pomelos were grown and harvested in Meizhou, Guangdong, China. From 20 July to 31 October 2019, experimental samples were harvested with various stages of granulation and immediately shipped to our lab, in Guangzhou, Guangdong. All experimental samples were stored at room temperature (19–21 °C) for 24 h before the experiment. Six hundred experimental samples without external defects were used for the experiment.

2.2. Acquisition of Image Information and Extraction of External Characteristics

For machine vision technology, we independently constructed a system to acquire image information (IIAS) (Figure 1). The system consists of a GigE camera (DFK 33GP006, Imaging Source, Germany), a camera lens (M0814-MP2, F1.4, f8 mm 2/3, China), an experimental box, a tray, two ring light sources (24V/580mA), two striped light sources (24V/580mA), and a computer equipped with “IC capture” image acquisition software and MATLAB R2018a software (MathWorks Inc., Natick, MA, USA). Combined graphical programming with the MATLAB Toolbox was used to extract external characteristics of the sample image.

The camera was calibrated using a calibration board with 110 mm squares and the Computer Vision System Toolbox 8.1 in MATLAB software. Camera intrinsics and camera extrinsics were acquired. The Image Processing Toolbox 10.2 in MATLAB was used to extract fruit characteristics of pomelo, including pixel values for length and width. Pixel values were transformed into millimeter values based on the camera intrinsics and camera extrinsics [22], and these millimeter values for length and width were used to estimate volume and mass. Thus, the external characteristics of honey pomelo were estimated based upon the image information captured by the image acquisition system [23].

Each sample was considered a similar ellipsoid. The volume estimation formulated in Equation (2) for an ellipsoid (V_ellipsoid) is a modification of the volume of a sphere (V_spere) presented in Equation (1).

V_{s p h e r e} = \frac{4}{3} π r^{3},

(1)

V_{e l l i p s o i d} = \frac{4}{3} π \frac{L}{2} \frac{W}{2} \frac{H}{2} \approx π \frac{L W^{2}}{6},

(2)

For a sphere, the length (L), width (W), and height (H) are equal to the radius (r), so the volume of an ellipsoid can be calculated from L, W, or H, as shown in Figure 2a. When the volume was estimated, each section was regarded as a cone frustum, with the top and bottom areas were considered as a standard circle, and W and H set to the same value (Figure 2b).

As shown in Figure 2, to better approximate the shape of the samples, the volume of the ellipsoid can be estimated by summing the volumes of an individual cone frustum [24]. The volume of each individual cone frustum and ellipsoid can be calculated using the following equations [21]:

V_{j} = \frac{1}{3} π Δ L ({(\frac{W_{b j}}{2})}^{2} + \frac{W_{b j} W_{t j}}{4} + {(\frac{W_{t j}}{2})}^{2}),

(3)

V = \sum_{j = 1}^{n} V_{j} = \sum_{j = 1}^{n} \frac{π Δ L}{12} (W^{2}_{b j} + W_{b j} W_{t j} + W^{2}_{t j}),

(4)

where W_bj is the diameter of the bottom circle, in mm; W_tj is the diameter of the top circle, in mm; ΔL is the height of the cone frustum, in mm; V_j is the volume of a cone frustum, in cm³; and V is the estimated volume, in cm³.

In 2005, Tabatabaeefar and Rajabipour found a high correlation between the mass and volume of apples with a high coefficient of determination [25]. This method used Equation (5) for the linear correlation between M_acc and V_acc. The calculated k and b represent the slope and the intercept, respectively, which were utilized to calculate M in Equation (6).

V_{a c c} = k M_{a c c} + b,

(5)

V = k M + b,

(6)

where M is an estimated value of mass, in g; M_acc is the accurate measured value of mass, in g; and V_acc is the accurate measured volume, in cm³.

2.3. Visible and Near-Infrared Transmittance Spectral Acquisition

Two portable spectrometers were used to measure the visible and near-infrared transmittance spectral data of honey pomelo. As shown in Figure 3, these included a QE-pro spectrometer (ocean optics, United States) with an optical resolution of 0.14~7.7 nm, a signal-to-noise ratio of 1000:1; and a spectral range of 400~1100 nm, and a NIR-QUEST 512 spectrometer (ocean optics, United States) with an optical resolution of about 3 nm, a signal-to-noise ratio of 15,000:1, and spectral range is 900–1700 nm. The honey pomelo was placed on a tray and put in the spectral acquisition box. The tray under the samples helps maintain the fruit in a stable position without movement. Ten tungsten halogen lamps (lamp power set to 100 W, with the beam in direct contact with the sample) were installed in the spectral acquisition box. An integrating hollow sphere was used for the relative measurement of luminous flux and was set up in the middle of the bottom of the spectral acquisition box. This was evenly sprayed with a white diffusion layer in the inner wall, with a window of about 38 mm on the outer wall. Transmittance light can pass through the samples and the window, is reflected by the inner wall of the integrating sphere, and then emitted to the detector. The signal from the detector is determined by the luminous flux of the light source. The integrating sphere is connected to the detector, with two optical fibers, and the other end of the optical fiber is separately connected to each spectrometer. At the same time, the spectrometers were connected via USB to a computer.

The parameters of the QE-pro and NIR-QUEST 512 spectrometers were set to 400 ms and 8 s, respectively. The system can acquire spectra data from the opposite sides of the equator of each sample (Figure 3). Each sample was measured four times in total, and then the average value was used for data analysis. The white reference spectrum (T_w) was calculated by covering a white cylindrical Teflon block (diameter 60 mm and height 50 mm) on the top of the window. The dark reference spectrum (T_d) was calculated by turning off the light source and completely covering the window of the integrating sphere. The reference spectrum was utilized to calculate the transmission spectrum of each sample, so that the transmittance measurements (T_s) of each sample were converted into relative transmittance values (T) based on T_w and T_d, as shown in Equation (7):

T = \frac{T_{s} - T_{d}}{T_{w} - T_{d}} \times 100 % .

(7)

2.4. Spectral Preprocessing

Spectral preprocessing is required for accurate spectral analysis [26]. In addition to valid spectral information, original spectra contain irrelevant information, interference, and background noise. Spectral preprocessing was used to reduce or eliminate these unwanted signals from all spectra, and includes steps of normalization (mean normalization treatment), multiplicative scatter correction (MSC), and the Savitzky–Golay smoothing (polynomial order: 3; smoothing points: 15) [27,28,29]. All spectral preprocessing methods were performed using Unscrambler v10.4 (CAMO PROCESS AS, Oslo, Norway).

2.5. Physical and Chemical Measurement

Each sample was labeled separately and weighed with a balance (HTP312, Hochoice, Shanghai, China). To understand the external dimensions of samples, the width and length were measured using a digital caliper (0–300 mm, IWOWN, China). The actual volume of samples were measured using the water displacement method (WDM) [30]. Juice of the detected area was obtained by squeezing pulp and gauze filtration, and SSC was recorded using a digital refractometer (ATAGO, PAL-GrapeMust (Brix), Japan). TA was measured using an automatic titrator (SCHOTTGERATE, ConsortC831, Belgium), and expressed as the mass percentage of citric acid [31]. MC was measured according to the Theanjumpol method [32]. Approximately 60 g of fruit pulp was randomly selected from three different parts of each sample and dried in a hot air oven (FD53/E2, Binder, Germany) at 70 °C for 96 h until the sample weight remained constant. And It was then used to calculate the mean MC, which represents the MC value for each sample.

2.6. Discriminant Analysis Algorithms

2.6.1. Linear Discriminant Analysis

Linear discriminant analysis (LDA) is a common discriminant analysis method that finds the optimal boundaries between discriminant analyses, maximizing class separability and minimizing within-class variability [33]. LDA allows the projection of high dimensional data onto low-dimensional space to better cluster samples within a class, so samples of different categories are maximally separated [34].

2.6.2. Discriminant Analysis of PCA-SVM

A support vector machine (SVM) classifier is a supervised learning algorithm that tends to perform better after data normalization [35,36]. SVM has a high accuracy rate for the classification of linear data, and can be utilized to deal with high dimensional nonlinear problems, such as discriminant classification of nonlinear data [37,38]. Various types of kernel functions are usually applied in an SVM classifier, such as the radial basis function (RBF) kernel function, linear kernel function, and polynomial kernel function. In this work, a type of RBF kernel function was applied for multi-category classification purposes [38]. The visible and near-infrared transmittance spectra for each sample included 1500 variables. On this basis, all the spectral data for measured samples were combined directly and the features were calculated. This method includes the most information, which should improve accuracy [39]. However, the increased amount of data to be delivered and calculated will decrease the computational rate of the SVM classifier. Therefore, before SVM discriminant analysis, the principal components were extracted in the visible and near-infrared spectra using principal component analysis (PCA) for the dimensional reduction of the spectral data. As shown in Figure 4, the total contribution of the first six principal components accounted for 100%, indicating these components can represent the entire spectral data. Thus, compared with SVM alone, PCA-SVM is a better choice as a classifier because it reduces the computational requirements.

2.6.3. Discriminant Analysis of PCA-GRNN

A generalized regression neural network (GRNN) is a type of radial basis neural network [40]. GRNN has a strong nonlinear mapping ability, a flexible network structure, and high fault tolerance and robustness, making it appropriate to solve nonlinear problems. GRNN has a four-layer structure, which is comprised of an input layer, a pattern layer, a summation layer, and an output layer (Figure 5). The GRNN is similar in structure to the radial basis function RBF neural network, but has improved approximation ability and faster learning speed, and has a better prediction ability for low amounts of sample data.

The number of neurons in the input layer corresponds to the dimension of the input vector in the learning sample. Each neuron is a simple distribution unit and is directly passed to the pattern layer according to the input variables, so the number of neurons in the pattern layer is equal to the number of input samples, since each neuron corresponds to a single sample. The transfer function of the neurons in the pattern layer is shown in Equation (8):

p_{i} = \exp [- \frac{- {(X - X_{i})}^{T} (X - X_{i})}{2 δ^{2}}], i = 1, 2, \dots, n

(8)

where X is the input variable of the model; X_i is the learning sample corresponding to the i-th neuron; and

δ

is the Gaussian function width factor, also known as smoothing factor.

In the summation layer, two types of transfer functions are utilized for summation. One transfer function represents a summation of the outputs of all neurons in the pattern layer, and this is represented by Equation (9). The other type of transfer function represents a weighted sum of all neurons in the pattern layer, and this can be calculated with Equation (10), where Y_i is the output sample corresponding to the i-th neuron.

S_{D} = \sum_{i = 1}^{n} \exp [- \frac{- {(X - X_{i})}^{T} (X - X_{i})}{2 δ^{2}}],

(9)

S_{n j} = \sum_{i = 1}^{n} Y_{i} \exp [- \frac{- {(X - X_{i})}^{T} (X - X_{i})}{2 δ^{2}}], j = 1, 2, \dots, n

(10)

The number of neurons in the output layer is equal to the dimension of the output vector in the samples. The output layer is equal to each neuron divided by the two types of transfer function output values of the summation layer. For example, the output of the j-th neuron corresponds to the j-th value of the estimation result y_j, as described by Equation (11).

y_{j} = \frac{S_{n j}}{S_{D}} = \frac{\sum_{i = 1}^{n} Y_{i} \exp [- \frac{- {(X - X_{i})}^{T} (X - X_{i})}{2 δ^{2}}]}{\sum_{i = 1}^{n} \exp [- \frac{- {(X - X_{i})}^{T} (X - X_{i})}{2 δ^{2}}]}, j = 1, 2, \dots, n

(11)

MATLAB software was used to call the “newgrnn” function to design the PCA-GRNN classifier. PCA was used to reduce the dimensionality of neurons in the input layer. The newgrnn function of PCA-GRNN contains P, T, and spread. P is the number of neurons in the input layer, T is the expected output value, and spread represents the walking constant, which is equal to 1. The expected output value of the PCA-GRNN can be compared to the actual value to evaluate the predictive performance of the model.

2.7. Modeling Methods and Model Evaluation

In this study, 600 samples were divided into a calibration set and a prediction set at a ratio of 2:1 according to the Kennard–Stone method [41]. Therefore, the calibration set included 400 samples, and the prediction set sample included 200. Because the visible and near-infrared spectra constitute a wide-band response [42], the qualitative analysis of substances should be carried out using chemometrics and discriminant analysis of models (LDA, PCA-SVM, and PCA-GRNN) [43]. Different granulation levels of samples were set from none to greater than 75% granulation, and designated A, B, C, D, and E. The classification accuracy represents the percentage of the samples classified correctly of all samples, and this was used to estimate model performance [44]. The accuracy rate of a model (ARM) was defined as follows [12]:

A R M (%) = (\frac{n u m b e r o f a c c u r a t e c l a s s i f i e d s a m p l e s}{n u m b e r o f t o t a l s a m p l e s}) \times 100,

(12)

When the numbers of positive and negative samples differ, confusion matrices can be used to evaluate different classification models, using information about correct classification and predicted categories [45]. For example, if Class A is identified as positive, then Class B, Class C, Class D, and Class E are defined as negative. The true positives (TP) are the samples that are positive and correctly classified as Class A. The true negatives (TN) are the samples that are negative and correctly assigned as negative. The false positives (FP) are the negative samples incorrectly classified as Class A. The false negatives (FN) are negative samples incorrectly assigned to other negative categories, such as if samples that should have been classified as Class B were incorrectly classified as Class C or Class D. Based on the model’s confusion matrices, predictive performance includes classification accuracy (CA), classification sensitivity (CS), and classification specificity (CSP), as defined by Equations (13)–(15), respectively. Values of CA, CS, and CSP that are closer to 1 suggest more accurate classification.

C A = \frac{T P + T N}{T P + T N + F P + F N},

(13)

C S = \frac{T P}{T P + F N},

(14)

C S P = \frac{T N}{T N + F P},

(15)

3. Results and Discussion

3.1. Fruit Characteristics and Spectral Interpretation

Increased granulation in honey pomelo affects both mass and volume. Dependent on the changing trend of the mass and volume, the width and length also showed a downward tendency, as shown in Figure 6. This changing trend can also be seen in Table 1. The changes of fruit characteristics indicate that the external feature extraction of samples can be used to discriminate fruits’ degree of granulation.

From Class A to Class E, with an increased granulation level of samples, the contents of SSC and TA gradually decreased (Table 1). Granulation often results in an accelerated decrease in SSC and TA in citrus fruit, including pomelo fruit, navel oranges, and Valencia oranges [7,46]. The evidence of granulation is that the juice vesicles become tough, dry, and colorless [4,47], and this can be easily seen in honey pomelo. The MC of samples decreased with increased dryness and toughness, and the pulp became less red and more white in color (Figure 6).

As shown in Figure 7a,b, within the visible spectral region (400–780 nm), no obvious absorption peaks were detected in the raw spectra and mean spectra. In the near-infrared spectral region, typical overlapping absorption peaks correspond to C-H, O-H, and N-H chemical bonds of water and carbohydrates in fruits [48]. In the short-wave near-infrared (SW-NIR) spectral region (780–1100 nm), an overlapping peak was detected at approximately 813 nm. The mean transmittance spectra of samples at 750–820 nm is related to the third and fourth overtone of C-H stretching vibrations and the first overtone of C=C stretching vibration [49]. The two spectrometers have overlapping regions of measurement (900~1100 nm), but because the instrument parameters are different, the peaks in this range are centered at 947 nm and 948 nm, corresponding to water absorption bands (O-H) [50].

As shown in Figure 7c,d, the long wave near-infrared (LW-NIR) spectra included a region of 1100–1700 nm, with a maximum absorption peak at approximately 1283 nm, which is the second overtone of the C-H functional group vibration band [51]. The absorption peak around 1410 nm corresponds to lignin-related absorption, and methylene C-H and C-H of R(CH₂)_nR groups [52]. Citrus fruit granulation is related to changes in the cell wall structure of juice vesicles that may increase the content of cell wall components, including lignin, hemicellulose, and pectin [5,8,53].

3.2. Estimation of Mass and Volume

The actual mass (M_acc) and the actual volume (V_acc) of all samples were determined and used to calculate the slope (k) and intercept (b) in Equation (5). By linear fitting analysis, k and b were determined as 2.1076 and −490.6325, respectively, with a coefficient of determination (R²) of 0.9048, as shown in Figure 8. The R² value was close to 1, indicating a high correlation, with a high coefficient of determination between M_acc and V_acc. Based on IIAS, V can be calculated by Equations (3) and (4). According to k, b, and V, M can be obtained by the application of Equation (6). The maximum, minimum, mean, and standard error values of mass (M_acc and M) and volume (V_acc and V) are displayed in Table 2.

A paired t-test was used for further comparison (Table 3). The obtained p-values for M_acc and M and V_acc and V were 0.2702 and 0.3022, respectively. These results showed M_acc and V_acc were not significantly different from M and V, respectively (p > 0.05). Therefore, the size of the pomelo did not affect the accuracy of the estimated values (p > 0.05) [21]. The corresponding standard deviations for the mass and volume differences were 14.9130 g and 73.4712 cm³, respectively.

In 1978, Pawlak introduced rough sets theory, where the actual values and estimated values have an equivalence relation [54]. The equivalence relation can be interpreted as a linear equation by linear fitting analysis, as shown in Figure 9A,B. In Figure 9A, for M_acc and M, the R² value was 0.9859. For V_acc and V, the R² value was 0.9879 (Figure 9B). The R² values can be interpreted as the proportion of the variance in the estimated data attributable to the variance in the actual data [21]. Higher R² values indicate that the estimated data are closer to the actual data. The IIAS method gave at least 98% accuracy for the estimation of mass and volume.

Figure 9C,D show the Bland–Altman plots of the differences between the actual and estimated value of samples [55]. The 95% limits of agreement of the differences are designated by d − 1.96 SD and d + 1.96 SD, as indicated by the outer lines, and the center line represents the 95% confidence interval for the mean differences. The percentage of mean differences (M_acc-M/mean) was 1.84 at the 95% limits of agreement for 0.89 and 2.78 (Figure 9C). The percentage of mean difference (V_acc-V/mean) was 7.2 at the 95% limits of agreement for 4.4 and 9.9 (Figure 9D). The results showed that V may be less than V_acc by 4.4% to 9.9%, and M may be lower than M_acc by 0.89% to 2.78%. Overall, the actual values and estimated values were confirmed with rough sets theory, and thus M and V can replace M_acc and V_acc for the classification of granulation levels of samples.

3.3. Classification of Granulation Levels

3.3.1. Discriminant Analysis of Models in Five Granulation Levels

Based on the visible and near-infrared transmittance spectroscopy and chemometrics, the LDA [56], PCA-SVM, and PCA-GRNN models were used to classify samples into five granulation levels. By discriminant analysis of the samples, the confusion matrices of the three models could be calculated, as shown in Table 4, and these data could be utilized to calculate TP, TN, FP, and FN.

In the confusion matrix, the ARM of each model represents the ratio as the sum of samples on the main diagonal to the total number of samples. According to the prediction set of samples (Table 4), the ARM values of LDA, PCA-SVM, and PCA-GRNN were 87.5%, 95.5%, and 95.5%, respectively, as calculated by Equation (12). Based on TP, TN, FP, and FN, predictive performance measures (CA, CS, and CSP) could be derived from Equations (13) to (15). Table 5 presents the predictive performances of the three models for five granulation levels for samples in the prediction set. Compared to PCA-SVM and PCA-GRNN, LDA had lower values of CA, CS, and CSP for Class C. Therefore, the predictive performance was higher for PCA-SVM and PCA-GRNN compared to LDA for discriminant analysis of the five granulation levels.

The results from use of LDA to classify predicted samples are shown in Figure 10a. Class C obtained perfect classification. Eleven samples in Class A and Class B were mistakenly classified as other classes, so there was an intersection set between Class A and Class B. This also occurred for Class D and Class E. Excessive misjudgment of samples decreased the accuracy of sample classification except for Class C in LDA. Figure 10b shows the discriminant results of PCA-SVM for predicted samples for the five granulation levels. The predictive performance of PCA-SVM was superior to that of LDA, indicating more samples were accurately classified. For example, Class A and Class B together had only three samples incorrectly classified in the PCA-SVM model, which is obviously less than the number classified by LDA. However, using PCA-SVM, some Class E samples were misjudged as Class C and Class D, resulting in an intersection set of Class C and Class D with Class E. The distributions of samples in Figure 10a,b were consistent with the confusion matrix in Table 4.

PCA-GRNN was superior to PCA-SVM for Class E. However, some samples of Class A and Class B were misclassified as Class C and Class D, respectively (Figure 11). Although there were fewer incorrectly classified samples of Class A and Class B for PCA-GRNN than for PCA-SVM, the CA, CS, and CSP values for Class A and Class B were lower for PCA-GRNN than for PCA-SVM (Table 4). Misclassified samples of Classes A and B also affected CA, CS, and CSP for Classes C and D in PCA-GRNN. Overall, PCA-GRNN had lower average values of CA and CS than PCA-SVM, but a relatively higher average CSP value (Table 5).

3.3.2. The Influence of Multi-Source Data Fusion on Multi-Category Classification Models

Multi-source data fusion can be described as three classifications: data-level fusion, character-level fusion, and decision-making level fusion [57]. In character-level fusion, each data source is pre-processed to extract representative features. For example, visible and near-infrared transmittance spectra were defined as multivariate data. Savitzky–Golay smoothing and normalization were used to preprocess the spectral data, and then the principal components were extracted from the spectral data using PCA. This was followed by character-level fusion and dimensionality reduction of spectral data. However, this will result in decreased accuracy of object identification and decreased predictive performances of the model classifier.

The rough set can be used to eliminate redundancy and improve efficiency [58]. External characteristics (M and V) can complement spectra data fusion [59]. Therefore, data-level fusion was performed based on external characteristics and spectral data, and then external characteristics (M and V) and principal components of preprocessing spectra were selected as condition attributes. This was followed by use of a classifier of decision attributes, such as PCA-SVM and PCA-GRNN, to identify important indicators that external features (M and V) of samples can be used to improve the predictive performances of models for accurate assessment of different granulation levels. To investigate the advantages of multi-source data fusion, we used this process to compare the predictive performances of PCA-SVM and PCA-GRNN models for samples at five granulation levels.

The predictive results of the PCA-SVM and PCA-GRNN for the prediction set are shown in Table 6 and Table 7. Using multi-source data fusion, the ARM values of PCA-SVM and PCA-GRNN were 98% and 99%, respectively. As shown in Table 6, the predictive performances (CA, CS, and CSP) were at least 0.96 for each class in the PCA-SVM. Comparison of the two models reveals that PCA-GRNN performed better than PCA-SVM for CA, CS, and CSP (Table 7). Both models showed high predictive accuracy, with significant decreased numbers of incorrectly classified samples as analyzed by the confusion matrix. Multi-source fusion could be utilized to further improve predictive performances of nonlinear multi-category models (PCA-SVM and PCA-GRNN); the results of the confusion matrix analysis showed that most of samples were classified as the correct category. However, it was difficult to fully distinguish granulation levels of 0% and 1–25%, and also difficult to completely differentiate granulation levels of 51–75% and more than 75%. Overall, the results showed that the PCA-GRNN model combined with multi-source data fusion was the most efficient method for determination of granulation levels in honey pomelo.

4. Conclusions

The results of this study show that visible and near-infrared transmittance spectroscopy can capture the chemical changes of SSC, TA, and MC caused by granulation in honey pomelo. Machine vision technology can quickly estimate the external characteristics of samples, and the obtained data can be used to identify changes in mass and volume caused by granulation. Application of these two methods together allows the improved identification and detection of granulation in honey pomelo compared with the use of a single or traditional method.

The results showed that multi-source data fusion was the optimal feature extraction method for granulation detection of honey pomelo, which improved the ARM values of PCA-SVM and PCA-GRNN from 95.5% to 98% and 99%, respectively. For PCA-SVM and PCA-GRNN, the CA of each class was enhanced at least from 0.9745 to 0.9947 and 0.9950, the CS was improved at least from 0.9231 to 0.9714 and 0.9750, and the CSP was improved at least from 0.9728 and 0.9815 to 0.9931 and 0.9934, respectively.

Visible and near-infrared transmittance spectroscopy combined with machine vision was successfully applied for the detection of granulation in honey pomelo. The predictive performances (measured by CA, CS, and CSP) was compared for three classification methods (LDA, PCA-SVM, and PCA-GRNN). PCA-GRNN provided the best classification results, with strong non-linear mapping, due to its flexible network structure. The developed multi-category model combined with multi-source data fusion could be used to non-destructively detect and grade the granulation levels in fruit.

Author Contributions

X.S. performed the experiments, analyzed the data, and wrote the paper. S.X. reviewed and edited the manuscript. H.L. supervised the research. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (Project No. 31901414), the Research and Development Program in Key Areas of Guangdong province (Project No. 2018B0202240001), the New Developing Subject Construction Program of Guangdong Academy of Agricultural Science (Project No. 201802XX), the Presidential Foundation of Guangdong Academy of Agricultural Science (Project No. 201920), and the Presidential Foundation of Guangdong Academy of Agricultural Science (Project No. 202034).

Acknowledgments

We are grateful for the support of the South China Agricultural University and Public Monitoring Center for Agro-Product of Guangdong Academy of Agricultural Sciences. The authors also thank the anonymous reviewers for their critical comments and suggestions to improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Definitions of abbreviations used in the manuscript.

Abbreviation	Paraphrase
ARM	Accuracy Rate of Model
CA	Classification Accuracy
CS	Classification Sensitivity
CSP	Classification Specificity
GRNN	Generalized Regression Neural Network
LDA	Linear Discriminant Analysis
LW-NIR	Long Wave Near Infrared
MC	Moisture Content
IIAS	A System to Acquire Image Information
PCA	Principal Component Analysis
RBF	Radial Basis Function
SSC	Soluble Solids Content
SVM	Support Vector Machine
SW-NIR	Short Wave Near-Infrared
TA	Titratable Acidity
TP	True Positives
TN	True Negatives
FP	False Positives
FN	False Negatives
WDM	Water Displacement Method

References

Zhou, Y.; He, W.; Zheng, W.; Tan, Q.; Xie, Z.; Zheng, C.; Hu, C. Fruit sugar and organic acid were significantly related to fruit Mg of six citrus cultivars. Food Chem. 2018, 259, 278–285. [Google Scholar] [CrossRef] [PubMed]
Bartholomew, E.; Sinclair, W.; Raby, E. Granulation (crystallization) of Valencia oranges. Calif. Citrogr. 1934, 19, 88–89. [Google Scholar]
Xie, Z.; Zhuang, Y.; Wang, R.; Xu, W.; Huang, Y. Granulation and dehiscent segments of Guan honey pomelo fruits and their correlation to mineral nutrients. J. Fujian Agric. Univ. 1998, 27, 42–46. [Google Scholar]
Ritenour, M.; Albrigo, G.; Burns, J.K.; Miller, W.M. Granulation in Florida citrus. Proc. Fla. State Hortic. Soc. 2004, 117, 358–361. [Google Scholar]
Shomer, I.; Chalutz, E.; Vasiliver, R.; Lomaniec, E.; Berman, M. Scierification of juice sacs in pummelo (Citrus grandis) fruit. Can. J. Bot. 1989, 67, 625–632. [Google Scholar] [CrossRef]
Singh, R. 65-year research on citrus granulation. Indian J. Hortic. 2001, 58, 112–144. [Google Scholar]
Sinclair, W.B.; Jolliffe, V.A. Chemical changes in the juice vesicles of granulated Valencia oranges. J. Food Sci. 1961, 26, 276–282. [Google Scholar] [CrossRef]
Burns, J.K.; Achor, D.S. Cell-wall changes in juice vesicles associated with section drying in stored late-harvested grapefruit. J. Am. Soc. Hortic. Sci. 1989, 114, 283–287. [Google Scholar]
Pailly, O.; Tison, G.; Amouroux, A. Harvest time and storage conditions of ‘Star Ruby’ grapefruit (Citrus paradisi Macf.) for short distance summer consumption. Postharvest Biol. Technol. 2004, 34, 65–73. [Google Scholar] [CrossRef]
Burns, J.K.; Albrigo, L.G. Time of harvest and method of storage affect granulation in grapefruit. HortScience 1998, 33, 728–730. [Google Scholar] [CrossRef]
Jie, D.F.; Xie, L.J.; Fu, X.P.; Rao, X.Q.; Ying, Y.B. Variable selection for partial least squares analysis of soluble solids content in watermelon using near-infrared diffuse transmission technique. J. Food Eng. 2013, 118, 387–392. [Google Scholar] [CrossRef]
Jie, D.; Zhou, W.; Wei, X. Nondestructive detection of maturity of watermelon by spectral characteristic using NIR diffuse transmittance technique. Sci. Hortic-Amst. 2019, 257, 108718. [Google Scholar] [CrossRef]
Tian, X.; Fan, S.; Huang, W.; Wang, Z.; Li, J. Detection of early decay on citrus using hyperspectral transmittance imaging technology coupled with principal component analysis and improved watershed segmentation algorithms. Postharvest Biol. Technol. 2020, 161, 111071. [Google Scholar] [CrossRef]
Brereton, R.G. Pattern recognition in chemometrics. Chemometr. Intell. Lab. 2015, 149, 90–96. [Google Scholar] [CrossRef]
Zhang, H.; Zhan, B.; Pan, F.; Luo, W. Determination of soluble solids content in oranges using visible and near infrared full transmittance hyperspectral imaging with comparative analysis of models. Postharvest Biol. Technol. 2020, 163, 111148. [Google Scholar] [CrossRef]
Nyalala, I.; Okinda, C.; Nyalala, L.; Makange, N.; Chao, Q.; Chao, L.; Yousaf, K.; Chen, K. Tomato volume and mass estimation using computer vision and machine learning algorithms: Cherry tomato model. J. Food Eng. 2019, 263, 288–298. [Google Scholar] [CrossRef]
Okinda, C.; Sun, Y.; Nyalala, I.; Korohou, T.; Opiyo, S.; Wang, J.; Shen, M. Egg volume estimation based on image processing and computer vision. J. Food Eng. 2020, 283, 110041. [Google Scholar] [CrossRef]
Bargoti, S.; Underwood, J.P. Image segmentation for fruit detection and yield estimation in apple orchards. J. Field Robot. 2007, 34, 1039–1060. [Google Scholar] [CrossRef] [Green Version]
Khojastehnazhand, M.; Mohammadi, V.; Minaei, S. Maturity detection and volume estimation of apricot using image processing technique. Sci. Hortic-Amst. 2019, 251, 247–251. [Google Scholar] [CrossRef]
Hahn, F.; Sanchez, S. Carrot volume evaluation using imaging algorithms. J. Agric. Eng. Res. 2000, 75, 243–249. [Google Scholar] [CrossRef]
Omid, M.; Khojastehnazhand, M.; Tabatabaeefar, A. Estimating volume and mass of citrus fruits by image processing technique. J. Food Eng. 2010, 100, 315–321. [Google Scholar] [CrossRef]
Zhang, Z. A Flexible New Technique for Camera Calibration. IEEE Trans. Pattern Anal. 2000, 22, 1330–1334. [Google Scholar] [CrossRef] [Green Version]
Yamamoto, K.; Guo, W.; Yoshioka, Y.; Ninomiya, S. On Plant Detection of Intact Tomato Fruits Using Image Analysis and Machine Learning Methods. Sensors 2014, 14, 12191–12206. [Google Scholar] [CrossRef] [Green Version]
Sabliov, C.M.; Boldor, D.; Keener, K.M.; Farkas, B.E. Image Processing Method To Determine Surface Area and Volume of Axi-Symmetric Agricultural Products. Int. J. Food Prop. 2002, 5, 641–653. [Google Scholar] [CrossRef]
Tabatabaeefar, A.; Rajabipour, A. Modeling the mass of apples by geometrical attributes. Sci. Hortic-Amst. 2005, 105, 373–382. [Google Scholar] [CrossRef]
Shafiee, S.; Minaei, S. Combined data mining/NIR spectroscopy for purity assessment of lime juice. Infrared Phys. Technol. 2018, 91, 193–199. [Google Scholar] [CrossRef]
Xiaobo, Z.; Jiewen, Z.; Povey, M.J.; Holmes, M.; Hanpin, M. Variables selection methods in near-infrared spectroscopy. Anal. Chim. Acta 2010, 667, 14–32. [Google Scholar] [CrossRef]
Geladi, P.; MacDougall, D.; Martens, H. Linearization and scatter-correction for near-infrared reflectance spectra of meat. Appl. Spectrosc. 1985, 39, 491–500. [Google Scholar] [CrossRef]
Dong, X.; Dong, J.; Li, Y.; Xu, H.; Tang, X. Maintaining the predictive abilities of egg freshness models on new variety based on VIS-NIR spectroscopy technique. Comput. Electron. Agric. 2019, 156, 669–676. [Google Scholar] [CrossRef]
Mohsenin, N.N. Physical properties of plant and animal materials. J. Agric. Eng. Res. 1968, 13, 379. [Google Scholar]
Soltan, S.S.A.M. The Effects of Varieties Sources of Omega-3 Fatty Acids on Diabetes in Rats. Food Nutr. Sci. 2012, 3, 1404–1412. [Google Scholar] [CrossRef] [Green Version]
Theanjumpol, P.; Wongzeewasakun, K.; Muenmanee, N.; Wongsaipun, S.; Krongchai, C.; Changrue, V.; Boonyakiat, D.; Kittiwachana, S. Non-destructive identification and estimation of granulation in ‘Sai Num Pung’ tangerine fruit using near infrared spectroscopy and chemometrics. Postharvest Biol. Technol. 2019, 153, 13–20. [Google Scholar] [CrossRef]
Brereton, R.G.; Lloyd, G.R. Partial least squares discriminant analysis: Taking the magic away. J. Chemometr. 2014, 28, 213–225. [Google Scholar] [CrossRef]
Jia, W.; Liang, G.; Tian, H.; Sun, J.; Wan, C. Electronic Nose-Based Technique for Rapid Detection and Recognition of Moldy Apples. Sensors 2019, 19, 1526. [Google Scholar] [CrossRef] [Green Version]
Navarro, P.J.; Perez, F.; Weiss, J.; Egea-Cortines, M. Machine Learning and Computer Vision System for Phenotype Data Acquisition and Analysis in Plants. Sensors 2016, 16, 641. [Google Scholar] [CrossRef] [Green Version]
Vijayanand, R.; Devaraj, D.; Kannapiran, B. Intrusion detection system for wireless mesh network using multiple support vector machine classifiers with genetic-algorithm-based feature selection. Comput. Secur. 2018, 77, 304–314. [Google Scholar] [CrossRef]
Xie, T.; Yao, J.; Zhou, Z. DA-Based Parameter Optimization of Combined Kernel Support Vector Machine for Cancer Diagnosis. Processes 2019, 7, 263. [Google Scholar] [CrossRef] [Green Version]
Yang, X.; Hong, H.; You, Z.; Cheng, F. Spectral and Image Integrated Analysis of Hyperspectral Data for Waxy Corn Seed Variety Classification. Sensors 2015, 15, 15578–15594. [Google Scholar] [CrossRef] [Green Version]
Niu, G.; Han, T.; Yang, B.S.; Tan, A.C. Multi-agent decision fusion for motor fault diagnosis. Mech. Syst. Signal Process. 2007, 21, 1285–1299. [Google Scholar] [CrossRef] [Green Version]
Feng, L.; Zhu, S.; Lin, F.; Su, Z.; Yuan, K.; Zhao, Y.; He, Y.; Zhang, C. Detection of Oil Chestnuts Infected by Blue Mold Using Near-Infrared Hyperspectral Imaging Combined with Artificial Neural Networks. Sensors 2018, 18, 1944. [Google Scholar] [CrossRef] [Green Version]
Zhang, L.; Li, G.; Sun, M.; Li, H.; Wang, Z.; Li, Y.; Lin, L. Kennard-Stone combined with least square support vector machine method for noncontact discriminating human blood species. Infrared Phys. Technol. 2017, 86, 116–119. [Google Scholar] [CrossRef]
Hao, Y.; Geng, P.; Wu, W.; Wen, Q.; Rao, M. Identification of Rice Varieties and Transgenic Characteristics Based on Near-Infrared Diffuse Reflectance Spectroscopy and Chemometrics. Molecules 2019, 24, 4568. [Google Scholar] [CrossRef] [Green Version]
Guo, Y.; Ni, Y.; Kokot, S. Evaluation of chemical components and properties of the jujube fruit using near infrared spectroscopy and chemometrics. Spectrochim. Acta Part A 2016, 153, 79–86. [Google Scholar] [CrossRef]
Qiu, G.; Lü, E.; Wang, N.; Lu, H.; Wang, F.; Zeng, F. Cultivar Classification of Single Sweet Corn Seed Using Fourier Transform Near-Infrared Spectroscopy Combined with Discriminant Analysis. Appl. Sci. 2019, 9, 1530. [Google Scholar] [CrossRef] [Green Version]
Chong, I.G.; Jun, C.H. Performance of some variable selection methods when multicollinearity is present. Chemometr. Intell. Lab. 2005, 78, 103–112. [Google Scholar] [CrossRef]
Wang, X.Y.; Wang, P.; Qi, Y.P.; Zhou, C.P.; Yang, L.T.; Liao, X.Y.; Wang, L.Q.; Zhu, D.H.; Chen, L.S. Effects of granulation on organic acid metabolism and its relation to mineral elements in Citrus grandis juice sacs. Food Chem. 2014, 145, 984–990. [Google Scholar] [CrossRef]
Ruan, Y.L. Sucrose metabolism: Gateway to diverse carbon use and sugar signaling. Annu. Rev. Plant Biol. 2014, 65, 33–67. [Google Scholar] [CrossRef]
Rittiron, R.; Narongwongwattana, S.; Boonprakob, U.; Seehalak, W. Rapid and nondestructive detection of watercore and sugar content in Asian pear by near infrared spectroscopy for commercial trade. J. Innov. Opt. Health Sci. 2014, 7, 1350073. [Google Scholar] [CrossRef] [Green Version]
Xie, L.; Ying, Y.; Ying, T. Combination and comparison of chemometrics methods for identification of transgenic tomatoes using visible and near-infrared diffuse transmittance technique. J. Food Eng. 2007, 82, 395–401. [Google Scholar] [CrossRef]
Guo, Z.; Wang, M.; Agyekum, A.A.; Wu, J.; Chen, Q.; Zuo, M.; El-Seedi, H.R.; Tao, F.; Shi, J.; Ouyang, Q.; et al. Quantitative detection of apple watercore and soluble solids content by near infrared transmittance spectroscopy. J. Food Eng. 2020, 279, 109955. [Google Scholar] [CrossRef]
Tigabu, M.; Odén, P.C. Discrimination of viable and empty seeds of Pinus patula Schiede & Deppe with near-infrared spectroscopy. New For. 2003, 25, 163–176. [Google Scholar]
Marton, J.; Sparks, H.E. Determination of lignin in pulp and paper by infrared multiple internal reflectance. Tappi 1967, 50, 363–368. [Google Scholar]
Wu, J.L.; Pan, T.F.; Guo, Z.X.; Pan, D.M. Specific lignin accumulation in granulated juice sacs of Citrus maxima. J. Agric. Food Chem. 2014, 62, 12082–12089. [Google Scholar] [CrossRef]
Pawlak, Z.; Grzymala-Busse, J.; Slowwinski, R. Rough sets. Commun. ACM 1995, 38, 89–95. [Google Scholar] [CrossRef]
Bland, J.M.; Altman, D.G. Measuring agreement in method comparison studies. Stat. Methods Med. Res. 1999, 8, 135–160. [Google Scholar] [CrossRef]
Mendoza, F.; Lu, R.; Cen, H. Grading of apples based on firmness and soluble solids content using Vis/SWNIR spectroscopy and spectral scattering techniques. J. Food Eng. 2014, 125, 59–68. [Google Scholar] [CrossRef]
Liu, T.; Li, A.; Ding, Y.; Zhao, D.; Li, Z. Multi-source information fusion applied to structural damage diagnosis. Struct. Infrastruct. Eng. 2011, 7, 353–367. [Google Scholar] [CrossRef]
Dong, G.; Zhang, Y.; Dai, C.; Fan, Y. The Processing of Information Fusion Based on Rough Set Theory. Chin. J. Sci. Instrum. 2005, 26, 1450–1451. [Google Scholar]
Karunaratne, S.; Thomson, A.; Morse-McNabb, E.; Wijesingha, J.; Stayches, D.; Copland, A.; Jacobs, J. The Fusion of Spectral and Structural Datasets Derived from an Airborne Multispectral Sensor for Estimation of Pasture Dry Matter Yield at Paddock Scale with Time. Remote Sens. 2020, 12, 2017. [Google Scholar] [CrossRef]

Figure 1. System to acquire image information.

Figure 2. (a) The dimensions of an ellipsoid; (b) the partitioning and calculation of an ellipsoid.

Figure 3. System used to acquire visible and near-infrared transmittance spectra.

Figure 4. Contribution rate distribution of principal components (PCs).

Figure 5. Structure of the generalized regression neural network (GRNN).

Figure 6. External characteristics of honey pomelo: (A) Samples of Class A; (B) Samples of Class B; (C) Samples of Class C; (D) Samples of Class D; (E) Samples of Class E. Granulation levels of honey pomelo: (a) Samples of Class A; (b) Samples of Class B; (c) Samples of Class C; (d) Samples of Class D; (e) Samples of Class E.

Figure 7. Visible and near-infrared transmittance spectra of samples: (a,c) raw spectra; (b,d) mean spectra for different granulation levels.

Figure 8. Linear fitting analysis of the actual mass (Macc) as weighed with a balance and the actual volume (Vacc) as measured using WDM.

Figure 9. (A,B) Linear fitting analysis of mass (M and M_acc) and volume (V and V_acc), respectively; (C,D) Bland–Altman plot of mass (M and M_acc) and volume (V and V_acc), respectively.

Figure 10. Discriminant analysis plot of multi-category classification models. (a) LDA model; (b) PCA-SVM model.

Figure 11. Discriminant analysis 3D-plot of PCA-GRNN model for predicted samples.

Table 1. Fruit characteristics, moisture content (MC), soluble solids content (SSC), and titratable acidity (TA) of honey pomelo for each granulation class.

Class	N	Mass (g)	Volume (cm³)	Width (mm)	Length (mm)	MC (%)	SSC (%)	TA (%)
A	146	2114.16 ± 8.04 ^a	4237.03 ± 19.83 ^a	193.82 ± 3.93 ^a	216.91 ± 1.93 ^a	87.9 ± 1.7 ^a	12.4 ± 0.8 ^a	0.86 ± 0.09 ^a
B	166	1953.23 ± 6.32 ^ab	3956.70 ± 17.46 ^a	190.86 ± 8.09 ^a	212.93 ± 8.66 ^ab	86.9 ± 1.4 ^ab	10.9 ± 0.6 ^b	0.82 ± 0.13 ^a
C	76	1900.83 ± 17.45 ^b	3358.68 ± 18.38 ^b	177.72 ± 8.17 ^b	202.90 ± 0.46 ^bc	85.2 ± 5.5 ^bc	10.3 ± 0.5 ^c	0.64 ± 0.09 ^b
D	120	1696.63 ± 17.06 ^c	3322.64 ± 12.97 ^b	177.51 ± 3.68 ^b	197.29 ± 5.88 ^c	84.2 ± 3.2 ^bc	10.1 ± 0.8 ^c	0.63 ± 0.07 ^b
E	92	1546.62 ± 14.69 ^c	2468.45 ± 15.79 ^c	162.03 ± 8.87 ^c	172.58 ± 6.76 ^d	83.3 ± 6.5 ^d	10.0 ± 0.6 ^c	0.54 ± 0.05 ^c

Note: Values with different superscript letters in the same column are significantly different (p < 0.05); N is the number of samples; Class A to Class E represent 0%, 1–25%, 26–50%, 51–75%, and greater than 75% granulation, respectively.

Table 2. Comparison of measured and estimated values for mass and volume.

	Mass (g)		Volume (cm³)		k	b
	M_acc	M	V_acc	V	k	b
Maximum	2735.12	2720.72	5660.00	5243.56	2.1076	−490.6325
Minimum	893.89	885.39	1440.00	1375.42
Mean	1316.96	1252.89	2285.00	2149.96
Standard error	423.35	386.97	924.60	902.99	0.0419	63.4135

Table 3. Paired t-test analyses between measured values and estimated values of external characteristics.

External Characteristics	Paired t-Test	Standard Deviation	95% Confidence Interval for the Mean Differences (%)
M_acc and M (g)	0.2702	14.9130	1.7392; 1.9325
V_acc and V (cm³)	0.3022	73.4712	6.8991; 7.4625

Table 4. Confusion matrix of the samples in the models.

Method	N	Calibration						N	Prediction
		Actual Class	Predicted Class						Actual Class	Predicted Class
		Actual Class	A	B	C	D	E		Actual Class	A	B	C	D	E
LDA	131	A	83	48	0	0	0	15	A	11	4	0	0	0
	101	B	12	89	0	0	0	65	B	7	58	0	0	0
	41	C	0	3	38	0	0	35	C	0	0	35	0	0
	70	D	0	0	2	63	5	50	D	0	0	0	45	5
	57	E	0	0	3	17	37	35	E	0	0	0	9	26
PCA-SVM	114	A	104	10	0	0	0	32	A	32	2	0	0	0
	131	B	6	125	0	0	0	35	B	1	34	0	0	0
	21	C	0	0	16	0	5	55	C	0	0	52	0	3
	94	D	0	0	0	88	6	26	D	0	0	0	25	1
	40	E	0	0	3	4	33	52	E	0	0	2	2	48
PCA-GRNN	99	A	90	5	4	0	0	47	A	44	1	2	0	0
	111	B	3	106	0	2	0	55	B	1	54	0	2	0
	50	C	4	0	46	0	0	26	C	2	0	24	0	0
	80	D	0	0	0	71	9	40	D	0	0	0	37	3
	60	E	0	0	0	8	52	32	E	0	0	0	0	32

Table 5. Predictive performances of three multi-category classification models for five granulation levels in the prediction set.

Class	LDA			PCA-SVM			PCA-GRNN
Class	CA	CS	CSP	CA	CS	CSP	CA	CS	CSP
A	0.9409	0.7333	0.9591	0.9845	0.9412	0.9938	0.9695	0.9362	0.9800
B	0.9409	0.8923	0.9669	0.9845	0.9714	0.9874	0.9795	0.9474	0.9928
C	1	1	1	0.9745	0.9455	0.9858	0.9795	0.9231	0.9882
D	0.9259	0.9000	0.9353	0.9845	0.9615	0.9881	0.9745	0.9250	0.9872
E	0.9259	0.7429	0.9675	0.9598	0.9231	0.9728	0.9845	1	0.9815

Table 6. The influence of multi-source data fusion on the predictive performance of PCA-SVM.

Data	N	Actual Class	Predicted Class					PCA-SVM
Data	N	Actual Class	A	B	C	D	E	CA	CS	CSP
The principal components of the spectral data + External characteristics data (M and V)	32	A	32	0	0	0	0	0.9948	1	0.9938
	35	B	1	34	0	0	0	0.9948	0.9714	1
	55	C	0	0	55	0	0	1	1	1
	26	D	0	0	0	25	1	0.9847	0.9615	0.9882
	52	E	0	0	0	2	50	0.9847	0.9615	0.9931

Table 7. The influence of multi-source data fusion on the predictive performance of PCA-GRNN.

Data	N	Actual Class	Predicted Class					PCA-GRNN
Data	N	Actual Class	A	B	C	D	E	CA	CS	CSP
The principal components of the spectral data + External characteristics data (M and V)	47	A	47	0	0	0	0	0.9950	1	0.9934
	55	B	1	54	0	0	0	0.9950	0.9818	1
	26	C	0	0	26	0	0	1	1	1
	40	D	0	0	0	39	1	0.9950	0.9750	1
	32	E	0	0	0	0	32	0.9950	1	0.9940

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, X.; Xu, S.; Lu, H. Non-Destructive Identification and Estimation of Granulation in Honey Pomelo Using Visible and Near-Infrared Transmittance Spectroscopy Combined with Machine Vision Technology. Appl. Sci. 2020, 10, 5399. https://doi.org/10.3390/app10165399

AMA Style

Sun X, Xu S, Lu H. Non-Destructive Identification and Estimation of Granulation in Honey Pomelo Using Visible and Near-Infrared Transmittance Spectroscopy Combined with Machine Vision Technology. Applied Sciences. 2020; 10(16):5399. https://doi.org/10.3390/app10165399

Chicago/Turabian Style

Sun, Xiaopeng, Sai Xu, and Huazhong Lu. 2020. "Non-Destructive Identification and Estimation of Granulation in Honey Pomelo Using Visible and Near-Infrared Transmittance Spectroscopy Combined with Machine Vision Technology" Applied Sciences 10, no. 16: 5399. https://doi.org/10.3390/app10165399

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Non-Destructive Identification and Estimation of Granulation in Honey Pomelo Using Visible and Near-Infrared Transmittance Spectroscopy Combined with Machine Vision Technology

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Samples

2.2. Acquisition of Image Information and Extraction of External Characteristics

2.3. Visible and Near-Infrared Transmittance Spectral Acquisition

2.4. Spectral Preprocessing

2.5. Physical and Chemical Measurement

2.6. Discriminant Analysis Algorithms

2.6.1. Linear Discriminant Analysis

2.6.2. Discriminant Analysis of PCA-SVM

2.6.3. Discriminant Analysis of PCA-GRNN

2.7. Modeling Methods and Model Evaluation

3. Results and Discussion

3.1. Fruit Characteristics and Spectral Interpretation

3.2. Estimation of Mass and Volume

3.3. Classification of Granulation Levels

3.3.1. Discriminant Analysis of Models in Five Granulation Levels

3.3.2. The Influence of Multi-Source Data Fusion on Multi-Category Classification Models

4. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI