An Automated Image Processing Module for Quality Evaluation of Milled Rice

Kurade, Chinmay; Meenu, Maninder; Kalra, Sahil; Miglani, Ankur; Neelapu, Bala Chakravarthy; Yu, Yong; Ramaswamy, Hosahalli S.

doi:10.3390/foods12061273

Open AccessArticle

An Automated Image Processing Module for Quality Evaluation of Milled Rice

by

Chinmay Kurade

¹

,

Maninder Meenu

²

,

Sahil Kalra

^1,*,

Ankur Miglani

³

,

Bala Chakravarthy Neelapu

⁴,

Yong Yu

^2,* and

Hosahalli S. Ramaswamy

⁵

¹

Department of Mechanical Engineering, Indian Institute of Technology, Jammu 181221, India

²

College of Biosystems Engineering and Food Science, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, China

³

Department of Mechanical Engineering, Indian Institute of Technology, Indore 453552, India

⁴

Department of Biotechnology and Engineering, National Institute of Technology, Rourkela 769008, India

⁵

Department of Food Science, McGill University, 21111 Lakeshore Road, St-Anne-de-Bellevue, QC H9X 3V9, Canada

^*

Authors to whom correspondence should be addressed.

Foods 2023, 12(6), 1273; https://doi.org/10.3390/foods12061273

Submission received: 10 February 2023 / Revised: 8 March 2023 / Accepted: 9 March 2023 / Published: 16 March 2023

(This article belongs to the Special Issue Sensors for Food Safety and Quality Assessment)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The paper demonstrates a low-cost rice quality assessment system based on image processing and machine learning (ML) algorithms. A Raspberry-Pi based image acquisition module was developed to extract the structural and geometric features from 3081 images of eight different varieties of rice grains. Based on features such as perimeter, area, solidity, roundness, compactness, and shape factor, an automatic identification system is developed to segment the grains based on their types and classify them by using seven machine learning algorithms. These ML models are trained using the images and are compared using different ML models. ROC curves are plotted for each model for quantitative analysis to assess the model’s performance. It is concluded that the random forest classifier presents an accuracy of 77 percent and is the best-performing model for the classification of rice varieties. Furthermore, the same algorithm is efficiently employed to determine the price of adulterated rice samples based upon the market price of individual rice.

Keywords:

automation; computer vision; Raspberry-Pi; quality assessment; machine learning; rice grains

1. Introduction

Rice is the most extensively consumed cereal around the globe [1]. The quality of the rice grain has a considerable impact on both the yields of rice for farmers and its economic return. Several improved rice varieties have been developed in past few decades to meet the demand of consumers for high-quality rice [2]. However, seasonal and geographical variations result in significant variation in yield, physical and nutritional quality of rice of same variety which in turn affects the market value of rice [3,4,5]. Therefore, rice is extremely vulnerable to fraud around the world [1]. There are four different types of rice adulteration observed in the market: (a) substitution with look-alike materials of low cost, (b) substitution with low-quality rice grains, (c) dilution of the original product, and (d) mislabeling of the age and origin of the material [6,7]. This adulteration can incur significant economic loss for food companies and consumers. The commercial value of the grains such as rice primarily depends on their chemical composition and structural features. Their chemical constituents are determined using various lab procedures that take time, raise costs, have a certain ecological footprint and need specific knowledge. Furthermore, vibrational spectroscopy in conjunction with chemometrics has also been employed to determine the chemical composition and varietal classification of various food grains [8,9,10,11]. These methods are non-destructive and eco-friendly, but require expensive instruments. The structural features of rice grains involve the determination of geometric features (size, shape) and other traits such as color, chalkiness, morphological and textural features. These physical features are generally visible to the naked eye and can be measured manually. However, the process of manual inspection is quite laborious, inconsistent, subjective, and time-consuming [12]. Near infrared spectroscopy and Fourier-transform infrared spectroscopy are also efficiently employed for adulteration detection in various food materials with minimal sample preparation [13,14,15,16]. In addition, near-infrared hyperspectral imaging has also been used for accurate classification and quantification of adulteration of different foods [17,18,19,20]. These features can also be detected via computer vision and can be automated easily. These computer vision techniques are efficient, non-destructive, less time-consuming and can also be efficiently employed at an industrial scale for on-site detection of rice adulteration [21,22]. Furthermore, with advancements in computer science, mechanical and automation engineering, there is huge scope for image processing and computer vision for food quality assessment based on machine learning [23]. Several studies have successfully employed digital image processing (DIP) techniques for feature extraction, classification and quality prediction of various foods materials [24]. DIP-based food quality assessment is a fast, non-invasive, non-destructive, safe, energy-efficient, low-cost technique that does not require skilled personnel to operate the instrument. Previously, researchers have successfully employed DIP for fractal analysis of retrogradation of rice starch [25] and authentication of rice varieties as well as the derived flour [6,26]. In these studies, the individual rice grains were segmented from the background and their extracted geometric parameters were used to classify the rice grains as either high, medium or low quality. In the present study, the grains are classified using image processing algorithms and machine learning algorithms. Rice grain segmentation and feature extraction are performed automatically by image processing algorithms, and these features are fed to machine learning algorithms to automatically classify, which makes the system fully automatic. In addition, previously, researchers have employed expensive cameras and smartphone cameras as image capturing devices [24], which ultimately adds to the cost of overall setup. However, in the present study, the Raspberry-Pi module used is a low low-cost, portable and easy-to-use device that resulted in an overall setup cost of only USD 50. Thus, the primary objective of the study is to develop an efficient, fully automated and cost-effective system to classify rice grains based on varieties such as Basmati, Eco Kolam, HMT Kolam, Kana Basmati, Sona Masuri, Tibar Basmati, Tukda Basmati, and Wada Kolam. This study explored different ML algorithms such as logistic regression (LR), decision tree (dT), random forest (rF), multilayer perceptron (MLP), and support vector machine (sVm) with linear, polynomial, radial basis function (RBF), and sigmoid kernels.

2. Materials and Methods

2.1. Raspberry-Pi-Based Machine Vision System

The low-cost Raspberry-Pi module is used for the classification of different types of rice grains (Figure 1a). This module is equipped with a 16 GB memory card that acts as a primary disk for operating the Raspberry-Pi desktop system–Debian version 10 (buster) and the storage medium for captured rice images. Figure 1b shows the steps that are followed to achieve the rice classification using the Raspberry-Pi module-based machine vision system: image acquisition, pre-processing, segmentation, feature extraction, and the training and testing of machine learning models for classification.

2.2. Sample Collection

Eight different varieties of rice grains commonly consumed in the region, namely Basmati (BM), Kana Basmati (KB), Tibar Basmati (TB), Tukda Basmati (TKB), Eco Kolam (EK), HMT Kolam (HK), Wada Kolam (WK), and Sona Masuri (SM), were collected from the local market, Mumbai, India, in 2020. Supplementary Table S1 details the local market price and the number of samples for each rice variety. The majority of collected rice grains were healthy, except for a few, in which partial chalkiness was observed in the grains.

2.3. Imaging System

The rice grain images were acquired using a 5 MP integrator IR-Cut camera (OV5647 5MP 1080P) by Omnivision Technologies (Santa Clara, CA, USA) with a 0.25-inch CCD sensor and an adjustable focal length. The IR-cut filter helps in reducing the color distortions resulting from IR light during daylight. The camera was interfaced with Raspberry-Pi (Raspberry-Pi foundation group, Cambridge, UK) using a flat 15-pin Camera Serial Interface (CSI) ribbon cable for power and the relay. The operating system raspi-config is installed in the Raspberry-Pi module. For starting the camera and image acquisition, Python script was used from the host computer (Dell.Inc, i7 processor, 64 GB Ram and Nvidia GPU) using the Secure File Transfer Protocol (SFTP) and stored in JPEG format. All rice grain images were taken against a dark blue background at a fixed distance from the camera (Figure 1a). The blue background color is preferred over a black background as it provides a better contrast, which enables easy identification of grey and black colored objects such as soil or rock particles. The complete experimental setup was placed in a closed chamber, and multiple LED lights were placed at uniform distance to increase illumination.

2.4. Dataset Details

The image dataset consists of 3081 images with 230 images (approx.) acquired for each variety (Supplementary Table S1). For each image, randomly selected 80 grains are spread uniformly without touching each other. Each image is in RGB format with a resolution of 2592 × 1944 pixels. For analyzing each rice grain, image segmentation is performed (Section 2.5.1). A 5 × 5 median filter is used to remove salt and pepper noise resulting from stray reflections from surroundings. This filter considers a 5 × 5-pixel region around a particular pixel and replaces it with the median value for RGB channels.

2.5. Image Post-Processing

2.5.1. Image Segmentation

Image segmentation is performed to isolate each rice grain to enable its feature extraction. The original RGB image is converted to a grayscale image and the individual rice grains are identified against the background using Otsu’s thresholding technique [27]. Subsequently, the Watershed algorithm is used to extract the foreground and the background, using markers to detect the boundaries. Otsu’s thresholding algorithm was obtained from Open-CV library. The procedure for determining the threshold value involves computation of a histogram from a grayscale image and the probabilities of occurrence of all the

k^{t h}

intensity levels using Equation (1):

p (k) = \frac{Number of pixels with intensity k}{T o t a l n u m b e r o f p i x e l s}

(1)

The histogram for a representative image is shown in Supplementary Figure S1.

Second, the initial class probability is calculated as

U_{o} (t) = \sum_{k = t_{m i n}}^{t - 1} p (k); and U_{1} (t) = \sum_{k = t}^{t_{m a x}} p (k)

(2)

where,

U_{i}

are the class probabilities with

U_{o}

(0) = 0, and

U_{1}

(0) = 1

Third, the class mean is calculated as

m_{o} (t) = \sum_{k = t_{m i n}}^{t - 1} \frac{k p (k)}{U_{o} (t)}; and m_{1} (t) = \sum_{k = t_{m i n}}^{t - 1} \frac{k p (k)}{U_{1} (t)};

(3)

where

m_{i}

is the class means (classes being 0 and 1 for binary), initially,

m_{o} (0) =

0 and

m_{1} (0) =

1. The parameters

t

and

k

represent the pixel intensity and threshold intensity levels, respectively.

Subsequently, the interclass variance is calculated as

V_{b}^{2} (t) = U_{o} (t) U_{1} (t) {[m_{o} (t) - m_{1} (t)]}^{2}

(4)

If

V_{b}^{2} (t) > V_{b}^{2} m a x

V_{b}^{2} (t) m a x = V_{b}^{2} (t), and thresh = t

where

V_{b}^{2} (t)

is the interclass variance and

V_{b}^{2} (t) m a x

is the maximum value for which the search is performed to obtain the optimum threshold value thresh. The pixel intensity varies from 0–255.

The final output of threshold is the intensity value corresponding to maximum variance

V_{b}^{2} (t) m a x

by which the binary image is generated. The original image and the result after thresholding is shown in Supplementary Figures S2a and S3b, respectively. A representative example of image thresholding with 150 rice grains is shown in Supplementary Figure S3 for different values of the threshold intensity, along with the corresponding values of class-mean, variance, and threshold.

Next, the segmentation of each rice grain is carried out using marker-based Watershed algorithm [27]. In this algorithm, the marker is generated via morphological erosion operation on the binary image. Erosion removes the noisy pixels, smooths the object boundaries, and removes the outer layer of object pixels. It consists of an input image and a structuring element, which removes the boundary pixels from image depending on the degree of overlap. A 3 × 3 structuring element is used to perform the operation to ensure that the rice grains present in proximity are separated as two different grains and assigned with different markers. Subsequently, a connected component analysis is performed to obtain labels for all the markers. The resulting binary images are then passed through the Watershed algorithm to generate labels for the markers, which completes the segmentation process. The labelling of grains is represented by a unique random colored mask, and the images after thresholding and segmentation are shown in Supplementary Figures S2b and S2c, respectively. After segmentation, the images are stored as individual rice grain images in JPEG format. Note that the Watershed algorithm fails to separate the grains that touch each other (Supplementary Figure S4), and therefore, these are removed from the image dataset.

2.5.2. Feature Extraction

In this step, four different types of features, namely, the geometrical and morphological features, color features, and textural features are extracted from the individual rice grain images to train the machine learning models. The methodology for extraction of each type of feature is detailed in the following sections.

2.5.3. Geometrical and Morphological Features

The perimeter, area, solidity, roundness, compactness, and shape factor are extracted from the individual rice grain images using Open CV library [20]. A complete list of all the extracted features along with their mathematical expressions is detailed in Supplementary Table S2. First, the contour (Supplementary Figure S4) is generated for each rice grain by applying edge detection on binary image. This contour represents the boundary that encloses the regions with same pixel intensity. The region bounded by the contour represents the 2D-projected area of grain while the contour length represents the grain perimeter. These geometrical parameters are used to determine equivalent diameter, and other morphological features such as the roundness, and compactness of each grain. Subsequently, an ellipse that best fits the contour is constructed around the rice grain to determine is length (major axis), width (minor axis), and aspect ratio. The length and width of rice grains are approximated by the length of major and minor axis of ellipse, respectively. The length, width, and aspect ratio of a few representative rice images are shown in Supplementary Figure S5. Additionally, a bounding box/rectangle is fitted to the contour to determine morphological features, namely the spatial extent and the shape factor of rice grains.

2.5.4. Color-Based Features

The color-based features, namely, the Hue, Saturation and Value (HSV) are extracted for region inside the bounding box of a rice grain. HSV is an alternate representation of RGB model that is closely aligned with the way humans perceive color-making attributes: H describes the pure color, S measures the degree to which the pure color is diluted, and V represents the intensity that describes relative brightness of color. The R, G, B values are converted to HSV values using Equation (5):

Intensity,

V = M = \max (R, G, B)

(5)

m = \min (R, G, B)

Saturation,

S = \frac{M - m}{M}

, if

M > 0

or

S = 0

if

M =

0

Hue,

H = 60 \times (0 + \frac{G + B}{M - m}),

if

M = R

H = 60 \times (2 + \frac{B - R}{M - m}), if M = G

H = 60 \times (4 + \frac{R + G}{M - m}), if M = B

H = H + 360 if H < 0

Another color-based feature is the histogram, which is a graphical representation of the distribution of intensity levels in each of the six channels (R, G, B, H, S, V) of the image. A histogram plots the number of pixels corresponding to each intensity level. For each segmented rice grain image, six histograms are plotted, i.e., one for each channel (Supplementary Figure S6). Since the intensity value ranges from 0–255, a total of 256 values are possible for a single channel. However, the intensity levels of original image are quantized from 256 to 48 to obtain a histogram of 48 bins from each of the six channels to reduce computational time. A total of 288 (48 × 6) histogram features are calculated as follows:

h i s t (i) = \sum_{x = 1}^{N} \sum_{y = 1}^{M} {\begin{matrix} 1, i f I (x, y) = i \\ 0, o t h e r w i s e \end{matrix} i \in [1, 48]

(6)

2.5.5. Textural Features

The texture of rice grains determines the hardness, adhesiveness, cohesiveness, gumminess, springiness, resilience, and chewiness of cooked rice. For a rice grain image, the texture is measured in terms of spatial arrangement of intensity relative to the neighborhood of any given pixel, and other factors such as fineness/coarseness, smoothness, granulation, randomness, and irregularity. In this study, the gray level co-occurrence matrix (GLCM) is used to determine textural features of the image. It provides the distribution of co-occurring intensity values at a given offset by calculating how often a given pixel with intensity value

i

occurs adjacent to a pixel with intensity

j

. Let

I (x, y)

be the intensity of a pixel at location

(x, y)

in the image; the GLCM matrix

G

is defined as

G (i, j) = \sum_{x = 1}^{N} \sum_{y = 1}^{M} {\begin{matrix} 1, i f I (x, y) = i a n d I (x + d_{1}, y + d_{2}) = j \\ 0, o t h e r w i s e \end{matrix}

(7)

where

d_{1}

and

d_{2}

are the pixel offset distances, as shown in Supplementary Figure S7, and (

N, M

) is the image resolution (width, height). Four angles are selected for analysis: 0°, 45°, 90°, and 135°, and the values of

d_{1}

and

d_{2}

corresponding to these angles are listed alongside these directions in the coordinate frame (Supplementary Figure S7). The shape of GLCM depends on the maximum intensity level and the resolution. In this study, the shape of GLCM is 256 × 256, because there are 256 intensity levels. The textural features are defined using contrast (con), and statistical features such as correlation (corr), homogeneity (ho), angular second moment (asm), energy (en) and dissimilarity (dis) at different orientations of rice. These parameters are considered as the dataset in machine learning to segregate rice based upon its quality.

2.5.6. Statistical Features Using GLCM

After GLCM is derived, five statistical features, namely contrast, dissimilarity, correlation, homogeneity, energy, and angular second moment are calculated using the scikit-image Python library [20]. These statistical features are described as follows:

Contrast (CON) defines the intensity difference between a pixel and its surroundings over the entire image. Therefore, for an image with a constant intensity value, the contrast has zero value. The contrast is calculated as

C O N = \sum_{i, j}^{N_{G}} {(i - j)}^{2} p (i, j)

(8)

where,

N_{G}

is the number of intensity levels in the image.

Correlation (CORR) is a measure of how closely a pixel is correlated to its neighboring pixels over the entire image. Its magnitude ranges from −1 to 1, and it is described by the following expression:

C O R R = \sum_{i, j}^{N_{G}} \frac{(1 - μ i) (1 - μ j) p (i, j)}{σ_{i} σ_{j}}

(9)

where

μ i

and

μ j

are the mean values, and

σ_{i}

and

σ_{j}

are the standard deviations. These are expressed as follows:

μ_{x} = \sum_{i = 1}^{N_{G}} \frac{ρ_{i}^{x}}{N_{G}}

and

μ_{y} = \sum_{j = 1}^{N_{G}} \frac{ρ_{i}^{y}}{N_{G}}

;

σ_{x}^{2} = \sum_{i = 1}^{N_{G}} \frac{{(ρ_{i}^{x} - μ_{x})}^{2}}{N_{G}}

and

σ_{y}^{2} = \sum_{j = 1}^{N_{G}} \frac{{(ρ_{i}^{y} - μ_{y})}^{2}}{N_{G}}

.

Here,

ρ_{i}^{x}

and

ρ_{j}^{x}

are the marginal probability distributions, which are given by

ρ_{i}^{x} = \sum_{i = 1}^{N_{G}} G (i, j)

and

ρ_{j}^{x} = \sum_{j = 1}^{N_{G}} G (i, j)

.

Homogeneity (HO) represents the closeness of the distribution of elements in GLCM along the GLCM diagonal, and is given by

H O = \sum_{i, j}^{N_{G}} \frac{p (i, j)}{1 + | i - j |}

(10)

Angular Second Moment (ASM) is a sum of the squares of all elements of GLCM, and is calculated using the following equation:

A S M = \sum_{i = 1}^{N_{G}} \sum_{j = 1}^{N_{G}} G {(i, j)}^{2}

(11)

Energy (EN) is the square root of the angular second moment, and is given by

E N = \sum_{i, j}^{N_{G}} p {(i, j)}^{2}

(12)

Dissimilarity (DIS) is also a measure of the difference in the intensity levels between the pixels. However, unlike contrast, dissimilarity is linearly weighted. It is given by the following expression:

D I S = \sum_{i = 1}^{N_{G}} \sum_{j = 1}^{N_{G}} | i - j | G (i, j)

(13)

A segmented rice grain image and its corresponding maps of the six statistical features are shown in Supplementary Figure S8.

2.5.7. Feature Dataset

For each segmented rice grain image, a feature dataset consisting of 324 features is constructed, including 18 geometrical and morphological features, 282 color features, and 24 textural features (six texture features in four angular directions). The complete feature set is summarized in Supplementary Table S3. The features are stored in a tabular format with

M

rows and

N

columns, where each row corresponds to a unique rice grain image, and features corresponding to each image are stored as

1 \times N

dimensional vector, i.e.,

N

features per rice grain image. This way, the shape of the feature dataset is

M \times N

= 3081

\times

324.

2.6. Experiments

2.6.1. Model Training

The training of model is performed using the collected image dataset, and the final model is tested using remaining data samples to predict the type and amount of adulteration present in a rice sample. To divide the data, a stratified k-fold cross-validation approach is followed, where the dataset is split into k equal parts; one part is used for testing the model, and the rest, k−1, are used for training. This process is repeated k times, and an average of evaluation metrics (i.e., the accuracy and F1 score) is calculated for k iterations. The scikit-learn Python library is used for implementing ML algorithms.

2.6.2. Machine Learning (ML) Models

To enable classification of rice grains based on their variety, eight different ML models are adopted, namely, logistic regression (LR), decision tree (DT), random forest (RF), multilayer perceptron (MLP), and support vector machine (SVM) with linear, polynomial, radial basis function (RBF), and sigmoid kernels. For brevity, the RF and DT models are discussed, as the former is the best-performing model; however, for the remaining models, only results are presented.

2.6.3. Decision Tree (DT) Classifier

The DT model is used for predictive analysis. At each level (node), DT classifies (splits) the data based on a certain threshold of a particular feature that is used for training. The threshold and feature for the threshold are selected to minimize the optimizing parameter (entropy or Gini-impurity). Since this is a relatively simpler classification problem, a lightweight model is chosen, which minimizes the prediction time and maintains accuracy. The DT classifier has lowest inference time compared to other tree-based classification algorithms, as it only depends on depth of DT. Based on the classification task, certain features are more relevant than others, and developing the model based on these features increases the overall accuracy of the model. This is achieved by preventing overfitting of the model on training data. On the other hand, the decreased number of features means less training time is required. The Scikit-learn implementation of the DT model also provides the feature importance of all the sample features. The feature importance value ranges between 0 and 1, specifying the relative importance of each feature. The number of features is selected by training the model for various numbers of features and selecting the smallest number of features that gives the desired accuracy. Here, the top 59 features are selected, based on a threshold of the feature importance value. This threshold is decided using an iterative process to optimize the model accuracy. The top 40 features are shown in the plot in Supplementary Figure S9. It can be observed that majority of the features correspond to RGB or HSV channels.

2.6.4. Random Forest (RF) Classifier

This is a type of ensemble learning technique whose most basic unit is a DT, which is used for predictive analysis. At each level (node), the DT classifies (splits) data based on a certain threshold of a particular feature used for training. The threshold and feature for the threshold are selected to minimize the optimizing parameter (entropy or Gini-impurity). However, single decision trees tend to overfit the training data and have a lower testing accuracy. Hence, the RF classifier which uses bagging (bootstrap aggregation) is used to provide regularization and introduce some randomness into the model. The RF classifier consists of a certain number (usually 100+) of DT classifiers. Each DT makes its own prediction, and the final prediction in the case of the classification task is the mode of all the predictions. The python library ‘scikit-learn’ is used to train the RF classifier [20].

The RF classifier has several hyperparameters, such as (a) number of decision trees, (b) depth of each DT, (c) the minimum number of samples required to split an internal node, and (d) the minimum number of samples, which are to be tuned to improve the accuracy of a model. The grid search method is used to perform hyperparameter tuning, in which a combination of all the values of hyperparameters in each range are used to determine the best combination.

3. Model Comparisons and Discussion

3.1. Performance Metrics

The key objective of this study is to explore a ML model that enables accurate, rapid and cost-effective classification of rice grains based on their variety. This is a multi-class classification problem in which comparison of performance, accuracy, precision and macro-averaged F1-scores of the models are compared. The accuracy score is analyzed by using a confusion matrix, which predicts the performance of a model for a given rice variety. For each sample in the test dataset, the confusion matrix indicates the actual class of the rice sample as well as the class of the sample predicted by the respective ML model. Precision indicates the correctness of prediction of the class, and recall indicates the efficiency of a model to predict the samples with their actual class. Precision and recall are mathematically expressed as:

P r e c i s i o n = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e P o s i t i v e}

(14)

R e c a l l = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

(15)

The precision–recall trade-off depends upon the predicted probability threshold parameter. Hence, F1-score is used as a metric (see Equation (16)) which determines the overall performance and is obtained from the harmonic mean of the precision and recall. The performance metrics of each of the models are detailed in Table 1 for the image dataset analyzed in this study.

F 1 s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(16)

The results in Table 1 shows that SVM with RBF kernel has the best accuracy score of 0.773. However, the accuracy score is influenced by a class having higher support, i.e., a higher number of samples in the test set. A model can have higher accuracy by correctly classifying classes with higher support even if it fails to classify classes with lower support. Additionally, since the target of the model is to determine the adulteration in the rice sample, which may be present in imbalanced amount (the precision and recall), hence the F1 score takes precedence over the accuracy score. The F1 score treats all the classes equally, especially in case of imbalanced dataset, and provides a more reliable performance indicator. Hence, the RF classifier is chosen as the best classifier for segregation of rice varieties in an adulterated sample, achieving the best F1-score of 0.761.

3.2. Receiver Operator Characteristic (ROC) Curves

ROC curves are plotted for all eight varieties of rice samples and analyzed with different machine learning algorithms MLP, RF, LR, SVMRBF, DT, SVM linear, SVM polynomial and SVM sigmoid. Figure 2 shows the ROC curves of eight machine learning algorithms. Since ROC curves can only be obtained for binary classification problems, we have used one vs. the rest classifiers to obtain the ROC curves for each rice type. The ROC curves helped us choose the best algorithm for the classification task, and to indicate the performance of a particular model with respect to a given class. ROC curve plots the true-positive rate (TPR) of particular class of rice type against the false-positive rate (FPR) for a particular threshold value. Here, the threshold value ranges from 0 to 1 and signifies the minimum value of probability output for a sample to belong to a respective class. For a classifier, the TPR has to be closer to 1, and the FPR has to be closer to 0. However, there must be a trade-off between both. Hence, the ROC curve closest to the point (0,1) is preferred. For the rice types of HK, KB and TB, all the models, especially RF, logistic regression and SVMRBF, perform considerably well. However, logistic regression is better for classifying BM, and RF is better for classifying TKB.

3.3. Rice Variety Classification Using the RF Classifier

The results for the best classification model, the RF classifier, have been demonstrated. Stratified k-fold cross-validation is carried out with k = 10 splits in the dataset. The highest accuracy obtained is 0.809, and the lowest accuracy obtained is 0.745. The RF model shows an average accuracy of 0.770 in classifying eight different types of rice into their respective classes. The confusion matrix and the performance metrics are given in Table S5 and Table 2, respectively. The highest precision of 0.949 is obtained for BM. From the confusion matrix, we see that three of the predicted ‘Basmati’ rice types are actually ‘Tibar Basmati’. The lowest precision of 0.564 is obtained for the class ‘HMT Kolam’. The confusion matrix indicates that what the model predicted as ‘HMT Kolam’ has a good amount of ‘Wada Kolam’ mixed in. This can be attributed to their similarity in features, due to their same overall type, ‘Kolam’. ‘HMT Kolam’ has the lowest recall value of 0.590, which, according to the confusion matrix, indicates that it has features similar to ‘Tukda Basmati’ and ‘Wada Kolam’. The highest recall value of 0.908 is obtained for ‘Kana Basmati’, which indicates that the model can distinguish it from the rest of the rice types better than any other rice type. The F1 score, which is the harmonic mean of the precision and recall values, gives us a better understanding of the overall efficiency of the model in distinguishing different rice types. The highest F1 score of 0.923 is obtained for ‘Kana Basmati’, and the lowest of 0.576 is obtained for ‘HMT Kolam’. The average F1 score obtained is 0.761.

3.4. Validation of the RF Classifier

The purpose of validation is to determine the efficiency of the model to detect rice adulteration. The validation is performed using four different varieties of rice samples. For the validation, 50 grains of high-quality ’Basmati’ rice variety are mixed with 30 grains of low-quality Tibar Basmati (TB) variety. Subsequently, an image is captured and results are obtained. Each of the four samples consist of higher-priced and lower-priced rice grains in a fixed ratio of 5:3. This is keep consistent to ensure fair comparison of the model’s performance on each of the four samples. The validation results of the four rice samples are detailed in Supplementary Table S4. Additionally, the average price factor per grain is calculated based on the price of each predicted rice grain. Since we do not know the weight of each grain, we will use the average length of each type to obtain the price factors. This is compared with the actual price factor of the mixture, and the price factor without adulteration for each sample. This also indicates whether the model is able to determine the actual price of the grains:

Actual price factor = \sum p_{i} L_{i} l_{i}^{2} N_{i}

(17)

Predicted price factor = \sum p_{i} L_{i} l_{i}^{2} M_{i}

(18)

Price without adulteration factor = Price of grain with highest N_{i} \times L_{i} l_{i}^{2} \times \sum N_{i}

(19)

where

N_{i}

is the number of rice samples of

i^{t h}

type,

L_{i}

is the length of the rice sample of

i^{t h}

type,

l_{i}

is the width of the sample,

M_{i}

is the number of predicted rice samples, and

p_{i}

is the price per kg of rice sample. The quality of the rice detection rate indicates that the model can identify the adulteration in the rice samples. From the validation results, it may be concluded that even though the model is not able to accurately estimate the grains, it can estimate the percentage of rice of higher or lower quality with good accuracy. Additionally, there is a significant difference between the predicted price factor and price factor without adulteration. This helps to indicate the presence of adulteration in the form of mixing of lower-quality grains with high-quality grains. The price is also major factor in determining the quality of the rice samples. The model can predict the actual price of rice with a maximum of 15% error for the four samples.

3.5. Qualitative Comparison of ML Models

Supplementary Figure S10 shows the qualitative analysis of the eight machine learning algorithms used for the assessment of rice quality estimation for Sample 1. This helps to visualize the classification performance of all the machine learning models with respect to classifying adulteration in form of lower-quality rice. Here, in Sample 1, 50 grains of ‘Basmati’ (the higher-quality rice), and 30 grains of ‘Tibar Basmati’ (the lower-quality rice) are considered. It may be observed that most of the models are able to identify the lower-quality rice; however, other rice varieties are also predicted by these classifiers due to the structural similarity of the two varieties. It may be deduced from these figures that for the RF classifier, the misclassification rate is lowest, and the F1 score of this algorithm is better than that of other classifiers.

4. Conclusions and Future Scope

In this study, an image processing module using Raspberry-Pi was successfully designed to detect adulteration of either foreign particles or mixing of low-quality rice with high-quality rice. The images of mostly non-touching grains were captured, and the individual grains were segmented using the Watershed algorithm. Several features were extracted based on geometry, morphology, color, and texture. Subsequently, eight machine learning-based models were used to classify the rice type, and their performance is compared in terms of accuracy, precision, recall and F1-score. The RF classifier is found to be best-performing algorithm, with a model accuracy (F1-score: 0.761) of 76.1%. To improve the current accuracy, training on different deep learning models such as Efficient Net, Inception V3, Res Net, and Mobile Net may be carried out. Reinforcement learning or online training-based approaches may also be tried to improve the model while it is in use. In future, the same rice varieties grown in different regions or years need to be included in the sample pool, as do samples of same varieties grown in a different year or region and exposed to different seasonal or environmental conditions. Inclusion of these samples during training will enhance the efficiency of models. Furthermore, a user interface, such as an Android app or a web app, may be developed to enable remote access and easy viewing of the results. Moreover, big data can be used here to train the machine learning models and assist image enhancement, image classification and segmentation, leading to an increase in model efficiency. The model may further be launched with a mobile application to evaluate the actual price of adulterated rice available in local markets.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/foods12061273/s1. Figure S1: The histogram for a representative grayscale image of rice grains. Figure S2: Images of rice grains: (a) original RGB format, (b) grayscale format after global thresholding, and (c) individual labelled rice grains obtained using the Watershed algorithm. Figure S3: Estimation of global threshold value using Otsu’s technique. (Top) grayscale images of rice grains at different levels of thresholding. The corresponding histograms are shown below each image. (Bottom) The parameters used for obtaining the threshold values are tabulated. Figure S4: Representative images of the segmented rice grains: (Top row) images of individual rice grains with bounding contours that were selected for image dataset. (Bottom row) images of rice grains that are overlapping/touching, and therefore, omitted from the dataset. Figure S5: Representative images showing selective geometrical features (length, width and aspect ratio) calculated for segmented rice grains. Figure S6: (Left) RGB and HSV images of a segmented rice grain. (Right) histograms corresponding to the R, G, B channels (top row), and H, S, V channels (bottom row). Figure S7: Coordinate frame with the direction of analysis. Figure S8: Statistical features (ASM, contrast, correlation, energy, dissimilarity, and homogeneity) extracted using the GLCM for a representative segmented rice grain image. Figure S9: Key features and their magnitude in terms of their importance value. Figure S10: Labelled rice grain images with different machine learning models, which shows a qualitative comparison between the models. Table S1: Market price and number of grains of eight types of rice varieties. Table S2: Geometrical and morphological features of rice grains. Table S3: Summary of feature dataset. Table S4: Market price and number of grains of eight types of rice varieties. Table S5: Confusion matrix for the random forest classifier.

Author Contributions

The C.K., M.M., S.K. and A.M. have worked extensively for the manuscript. The B.C.N., Y.Y. and H.S.R. has given the directions and suggestion. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Śliwińska-Bartel, M.; Burns, D.T.; Elliott, C. Rice Fraud a Global Problem: A Review of Analytical Tools to Detect Species, Country of Origin and Adulterations. Trends Food Sci. Technol. 2021, 116, 36–46. [Google Scholar] [CrossRef]
Mittal, S.; Dutta, M.K.; Issac, A. Non-Destructive Image Processing Based System for Assessment of Rice Quality and Defects for Classification According to Inferred Commercial Value. Measurement 2019, 148, 106969. [Google Scholar] [CrossRef]
Bhupendra, K.; Moses, K.; Miglani, A.; Kumar Kankar, P. Deep CNN-Based Damage Classification of Milled Rice Grains Using a High-Magnification Image Dataset. Comput. Electron. Agric. 2022, 195, 106811. [Google Scholar] [CrossRef]
Meenu, M.; Decker, E.A.; Xu, B. Application of Vibrational Spectroscopic Techniques for Determination of Thermal Degradation of Frying Oils and Fats: A Review. Crit. Rev. Food Sci. Nutr. 2021, 62, 5744–5765. [Google Scholar] [CrossRef]
Meenu, M.; Xu, B. Application of Vibrational Spectroscopy for Classification, Authentication and Quality Analysis of Mushroom: A Concise Review. Food Chem. 2019, 289, 545–557. [Google Scholar] [CrossRef]
Kalra, S.; Meenu, M.; Kumar, D. Damage Detection in Eggshell Using Lamb Waves. Smart Innov. Syst. Technol. 2022, 239, 1–8. [Google Scholar] [CrossRef]
Meenu, M.; Zhang, Y.; Kamboj, U.; Zhao, S.; Cao, L.; He, P.; Xu, B. Rapid Determination of β-Glucan Content of Hulled and Naked Oats Using near Infrared Spectroscopy Combined with Chemometrics. Foods 2021, 11, 43. [Google Scholar] [CrossRef] [PubMed]
Kiratiratanapruk, K.; Temniranrat, P.; Sinthupinyo, W.; Prempree, P.; Chaitavon, K.; Porntheeraphat, S.; Prasertsak, A. Development of Paddy Rice Seed Classification Process Using Machine Learning Techniques for Automatic Grading Machine. J. Sens. 2020, 2020, 7041310. [Google Scholar] [CrossRef]
Meenu, M.; Guha, P.; Mishra, S. Impact of Infrared Treatment on Quality and Fungal Decontamination of Mung Bean (Vigna radiata L.) Inoculated with Aspergillus spp. J. Sci. Food Agric. 2018, 98, 2770–2776. [Google Scholar] [CrossRef]
Vithu, P.; Moses, J.A. Machine Vision System for Food Grain Quality Evaluation: A Review. Trends Food Sci. Technol. 2016, 56, 13–20. [Google Scholar] [CrossRef]
Dhakshina Kumar, S.; Esakkirajan, S.; Bama, S.; Keerthiveena, B. A Microcontroller Based Machine Vision Approach for Tomato Grading and Sorting Using SVM Classifier. Microprocess. Microsyst. 2020, 76, 103090. [Google Scholar] [CrossRef]
Meenu, M.; Kurade, C.; Neelapu, B.C.; Kalra, S.; Ramaswamy, H.S.; Yu, Y. A Concise Review on Food Quality Assessment Using Digital Image Processing. Trends Food Sci. Technol. 2021, 118, 106–124. [Google Scholar] [CrossRef]
Wu, Y.; Lin, Q.; Chen, Z.; Wu, W.; Xiao, H. Fractal Analysis of the Retrogradation of Rice Starch by Digital Image Processing. J. Food Eng. 2012, 109, 182–187. [Google Scholar] [CrossRef]
Izquierdo, M.; Lastra-Mejías, M.; González-Flores, E.; Pradana-López, S.; Cancilla, J.C.; Torrecilla, J.S. Visible Imaging to Convolutionally Discern and Authenticate Varieties of Rice and Their Derived Flours. Food Control. 2020, 110, 106971. [Google Scholar] [CrossRef]
Xu, X.; Xu, S.; Jin, L.; Song, E. Characteristic Analysis of Otsu Threshold and Its Applications. Pattern Recognit. Lett. 2011, 32, 956–961. [Google Scholar] [CrossRef]
Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools 2000, 120, 122–125. [Google Scholar]
Sun, H.; Yang, J.; Ren, M. A Fast Watershed Algorithm Based on Chain Code and Its Application in Image Segmentation. Pattern Recognit. Lett. 2005, 26, 1266–1274. [Google Scholar] [CrossRef]
Pathak, B.; Barooah, D. Texture Analysis Based on the Gray-Level Cooccurrence Matrix Considering Possible Orientations. Int. J. Adv. Res. Electr. Electron. Instrum. Eng. 2013, 2, 4206–4212. [Google Scholar]
Deswal, M.; Sharma, N. A Fast HSV Image Color and Texture Detection and Image Conversion Algorithm. Int. J. Sci. Res. 2014, 3, 1279–1284. [Google Scholar]
Van der Walt, S.; Schönberger, J.L.; Nunez-Iglesias, J.; Boulogne, F.; Warner, J.D.; Yager, N.; Gouillart, E.; Yu, T. Scikit-Image: Image Processing in Python. PeerJ 2014, 2014, e453. [Google Scholar] [CrossRef]
Haralick, R.M.; Dinstein, I.; Shanmugam, K. Textural Features for Image Classification. IEEE Trans. Syst. Man Cybern. 1973, SMC-3, 610–621. [Google Scholar] [CrossRef] [Green Version]
Yang, P.; Yang, G. Feature Extraction Using Dual-Tree Complex Wavelet Transform and Gray Level Co-Occurrence Matrix. Neurocomputing 2016, 197, 212–220. [Google Scholar] [CrossRef]
Haralick, R.M.; Sternberg, S.R.; Zhuang, X. Image Analysis Using Mathematical Morphology. IEEE Trans. Pattern Anal. Mach. Intell. 1987, PAMI-9, 532–550. [Google Scholar] [CrossRef]
Koklu, M.; Ozkan, I.A. Multiclass Classification of Dry Beans Using Computer Vision and Machine Learning Techniques. Comput. Electron. Agric. 2020, 174, 105507. [Google Scholar] [CrossRef]
Singh, S.; Giri, M. Comparative Study Id3, Cart And C4.5 Decision Tree Algorithm: A Survey. Int. J. Adv. Inf. Sci. Technol. 2014, 3, 47–52. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Pettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
LearnOpenCV. “Otsu’s Thresholding Technique.” LearnOpenCV, n.d. Available online: https://learnopencv.com/otsus-thresholding-technique/ (accessed on 5 February 2023).

Figure 1. (a) Raspberry-Pi-based image acquisition module with integrated IR-cut camera. Actual RGB image of the rice grains is shown on the right. (b) Flowchart showing the algorithm used for the classification of rice grains and ascertaining their quality.

Figure 2. Receiver operating characteristic ROC curve for (a) Basmati (BM), (b) Eco Kolam (EK), (c) HMT Kolam (HK), (d) Kana Basmati (KB), (e) Sona Masuri (SM), (f) Tibar Basmati (TB), (g) Tukda Basmati (TKB), (h) Wada Kolam (WK) varieties of rice.

Table 1. Performance metrics of eight machine learning models for classification of rice grains.

Performance Metric	MLP	RF	LR	DT	SVM RBF	SVM Linear	SVM Polynomial	SVM Sigmoid
Accuracy	0.734	0.771	0.768	0.676	0.773	0.764	0.705	0.699
Precision	0.730	0.767	0.739	0.646	0.762	0.742	0.728	0.718
Recall	0.728	0.760	0.739	0.655	0.751	0.748	0.683	0.669
F1-score	0.728	0.761	0.738	0.649	0.753	0.743	0.693	0.678

MLP, multilayer perceptron; RF, random forest; LR, logistic regression; DT, decision tree; SVM, support vector machine; RBF, radial basis function.

Table 2. Class-wise performance metrics of random forest classifier.

Performance Metric	BM	EK	HK	KB	SM	TB	TKB	WK
Recall	0.888	0.785	0.590	0.908	0.786	0.796	0.674	0.746
Precision	0.949	0.634	0.564	0.939	0.843	0.746	0.666	0.784
F1-score	0.917	0.701	0.576	0.923	0.813	0.770	0.670	0.765

BM, Basmati; EK, Eco Kolam; HK, HMT Kolam; KB, Kana Basmati; SM, Sona Masuri; TB, Tibar Basmati; TKB, Tukda Basmati; WK, Wada Kolam.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kurade, C.; Meenu, M.; Kalra, S.; Miglani, A.; Neelapu, B.C.; Yu, Y.; Ramaswamy, H.S. An Automated Image Processing Module for Quality Evaluation of Milled Rice. Foods 2023, 12, 1273. https://doi.org/10.3390/foods12061273

AMA Style

Kurade C, Meenu M, Kalra S, Miglani A, Neelapu BC, Yu Y, Ramaswamy HS. An Automated Image Processing Module for Quality Evaluation of Milled Rice. Foods. 2023; 12(6):1273. https://doi.org/10.3390/foods12061273

Chicago/Turabian Style

Kurade, Chinmay, Maninder Meenu, Sahil Kalra, Ankur Miglani, Bala Chakravarthy Neelapu, Yong Yu, and Hosahalli S. Ramaswamy. 2023. "An Automated Image Processing Module for Quality Evaluation of Milled Rice" Foods 12, no. 6: 1273. https://doi.org/10.3390/foods12061273

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Automated Image Processing Module for Quality Evaluation of Milled Rice

Abstract

1. Introduction

2. Materials and Methods

2.1. Raspberry-Pi-Based Machine Vision System

2.2. Sample Collection

2.3. Imaging System

2.4. Dataset Details

2.5. Image Post-Processing

2.5.1. Image Segmentation

2.5.2. Feature Extraction

2.5.3. Geometrical and Morphological Features

2.5.4. Color-Based Features

2.5.5. Textural Features

2.5.6. Statistical Features Using GLCM

2.5.7. Feature Dataset

2.6. Experiments

2.6.1. Model Training

2.6.2. Machine Learning (ML) Models

2.6.3. Decision Tree (DT) Classifier

2.6.4. Random Forest (RF) Classifier

3. Model Comparisons and Discussion

3.1. Performance Metrics

3.2. Receiver Operator Characteristic (ROC) Curves

3.3. Rice Variety Classification Using the RF Classifier

3.4. Validation of the RF Classifier

3.5. Qualitative Comparison of ML Models

4. Conclusions and Future Scope

Supplementary Materials

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI