Utilization of FTIR and Machine Learning for Evaluating Gluten-Free Bread Contaminated with Wheat Flour

Adedeji, Akinbode A.; Okeke, Abuchi; Rady, Ahmed M.

doi:10.3390/su15118742

Open AccessArticle

Utilization of FTIR and Machine Learning for Evaluating Gluten-Free Bread Contaminated with Wheat Flour

by

Akinbode A. Adedeji

^1,*

,

Abuchi Okeke

¹ and

Ahmed M. Rady

²

¹

Department of Biosystems and Agricultural Engineering, University of Kentucky, Lexington, KY 40506, USA

²

Food, Water, Waste Research Group, Faculty of Engineering, University of Nottingham, Nottingham NG7 2RD, UK

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(11), 8742; https://doi.org/10.3390/su15118742

Submission received: 21 March 2023 / Revised: 23 May 2023 / Accepted: 25 May 2023 / Published: 29 May 2023

(This article belongs to the Special Issue Sustainable Food Processing Safety and Public Health)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In this study, Fourier-transform infrared (FTIR) spectroscopy coupled with machine learning (ML) approaches were applied to detect and quantify wheat flour (WF) contamination in gluten-free cornbread. Samples of corn flour (CF) were contaminated with WF in the range of 0–10% with a 0.5% increment. The flour samples were baked into bread using basic bread formulation and ground into a fine particle size for homogeneity, and FTIR spectra of the ground samples were obtained and standardized before modeling. For constructing the classification model, majority voting-based ensemble learning (stack of k-nearest neighbor [KNN], random forest, and support vector classifier) was implemented to detect and quantify WF in the cornbread samples. KNN regressor was determined to be the best predictive model to quantify wheat contaminants based on the majority-vote ensemble. The optimal classification model for the test set showed an F1 score, true positive rate (TPR), and false negative rate (FNR) of 1.0, 1.0, and 0.0, respectively. For the quantification models, the coefficient of determination and root mean square error for the prediction set (R²_P and RMSEP) were 0.99 and 0.34, respectively. These results show the feasibility of utilizing FTIR along with supervised learning algorithms for the rapid offline evaluation of wheat flour contamination in gluten-free products.

Keywords:

celiac disease; cornbread; ensemble learning; gluten; machine learning; wheat flour; FTIR

1. Introduction

Sustainable food production has become a global need for the current and future generations to lessen the need for pollution-generating, traditional food production. Food allergy is reported to affect 3–5% of adults and 8% of children worldwide [1]. The medical and caregiving cost of food allergies in 2013 in the US was estimated between $19 billion and $25 billion USD [2]. Gluten-free products have a vast number of customers, some of which cannot tolerate the presence of gluten in foods. Gluten proteins including gliadin and glutenin in wheat and whole wheat are the major allergens in wheat. Celiac disease and gluten intolerance or sensitivity are examples of the immunological or physiological consequences of consuming such allergens [3]. In 2012, it was estimated that 0.3–0.9% of the US population suffers from celiac disease [4]. For people with gluten-related disorders, a strict gluten-free diet is essential to properly manage their reactions. The United States Food and Drug Administration (FDA) identified a gluten-free diet as any food containing less than or equal to 20 ppm of gluten [5]. Cross-contamination in the food industry is not an unusual matter. The severity of such incidents can cause a further collapse of food businesses through product recalls and even legal disputes. Cross-contamination does occur in gluten-free production lines, which leads to the gluten ratio being higher than 20 ppm in the final products. One of the common incidents of cross-contamination is caused by the inadequacy of the cleaning of processing lines that are used to manufacture gluten-containing (i.e., wheat, barley, rye, and their crossbred varieties) and gluten-free products [6,7]. Cross-contamination in food factories can be identified as the accidental mixing of one or more ingredients of a food product [7]. The baking process is a general example of where cross-contamination develops through the undeliberate mixing of raw materials with gluten-contained flours such as wheat, barley, or rye, or through the inadequate cleaning of processing lines that handle both gluten-containing and gluten-free products. In bakeries where bread is made, different gluten and gluten-free flours may be processed in the same factory where the same equipment (e.g., milling machine, kneading rolls, fermenting drum, etc.) or kitchen utensils are used for both. The likelihood of cross-contamination exists if proper cleaning is not followed or measures are taken to ascertain the gluten-free nature of the ingredients or processing lines. Food losses and/or food waste originating from cross-contaminated food represent a major problem for food businesses as well as the environment and healthcare considering the need to hospitalize patients suffering from food allergens as well as foodborne and other diseases resulting from food products that are adulterated or contaminated. Therefore, the detection of cross-contamination in food processing lines is a key factor to ensure that products labeled as gluten-free are suitable for consumption by consumers with gluten-related disorders.

Enzyme-linked immunosorbent assay (ELISA) is the most common analytical technique for identifying gluten in contaminated food products [8]. However, such a method is cumbersome and requires highly skilled personnel to conduct. The global market size value of gluten-free bakeries was estimated to be USD 1.81 billion, and the total revenue is expected to reach USD 4.15 billion by 2030. The US share of such a market in 2022 was USD 449.0 million [9]. Thus, considering the relatively large production as well as the demand for gluten-free baked products, ensuring safety through fast and accurate inspection is crucial in all processing steps, which achieves the targets of quality control.

Nondestructive methods based on optical sensors have been successfully applied for the quality evaluation of perishable and processed food products [10]. Several studies researched the applications of optical sensing for determining the quality characteristics and/or gluten content of grains’ flour and bread. Sørensen [11] investigated the evaluation of nutrients in a wide range of commercially produced bread varieties using near-infrared spectroscopy (NIRS) which yielded results with coefficient of determination (R²) values ranging from 0.76 to 0.99 for the different food constituents analyzed. The identification of low gluten concentrations (<4.5%, w/w) in gluten-free flour and the batter was conducted via NIRS (1100–2500 nm) and modified partial least squares (MPLS), and the validation models yielded validation models with R² values of up to 0.967 for the flour and 82.5% for the batter [12]. Ahmad, Nache [13] studied the application of fluorescence spectroscopy with an excitation parameter (λ_Excitation) of 270–550 nm and an emission parameter (λ_Emission) of 310–590 nm to detect low gluten levels (<5%, w/w) in commercial gluten-free flour. The partial least squares regression (PLSR) prediction models had R² values of 0.90 and a root mean square error of prediction (RMSEP) of as low as 0.46%. Quantifying the total protein and wet gluten in wheat flour was investigated by utilizing NIRS (800–3030 nm) and synergy interval support vector regression (siSVR) and siPLSR along with different spectral preprocessing methods [14]. The deduced prediction models had an R² as high as 0.906 for total protein and 0.850 for gluten, and an RMSEP as low as 0.925 for total protein and 1.024 for gluten.

Among optical noninvasive sensors, Fourier-transform infrared (FTIR) spectroscopy provides rapid, accurate, and comprehensive identification and a presenting fingerprint of the chemical composition of the food material with minimal sample preparation [15]. FTIR surpasses other optical systems by replacing the prism with an interferometer, which helps resolve overlapping infrared bandwidth for complex samples [15,16]. Fourier-transform near-infrared (FT-NIR) (835–2502 nm) resulted in comparable results to NIRS instrumentation (450–2498 nm) for reliable the prediction of the quality attributes of wheat grain (protein, moisture, and hardness index) and flour (protein, ash, and amylose) with R² ranging from 0.80 to 0.99 for NIRS and 0.83 for FT-NIR [17]. Raman spectroscopy (300–3700 cm⁻¹), diffuse reflectance-FT-NIR (3700–9000 cm⁻¹), and attenuated total reflectance (ATR) FTIR (400–4000 cm⁻¹) coupled with PLSR were compared to develop prediction models for gluten in wheat flour [18]. The models developed by cross-validation showed higher correlation coefficient (r) values obtained from FTIR (0.962) compared with diffuse reflectance FR-NIR (0.937) and Raman (0.9058) sensors. However, the relative standard error of prediction (RSEP) for the independent test set was higher for the FTIR (5.69); whereas the RSEP values were 3.54 for the diffuse reflectance FT-NIR and 3.24 for the Raman. Applications of FTIR in the food domain also included assessing cholesterol [19], calcium [20], and lactose [21] in powdered milk; the authentication of tea varieties [22]; assessing the quality of green tea [23], black tea [24], talcum [25], and polyphenols [26] in tea powder; determining antioxidant capacity of cocoa beans and chocolate [27], assessing cocoa beans adulteration [28], and evaluating the adulteration [29] and defects [30] in ground and/or roasted coffee. To our knowledge, no previous study searched the utilization of FTIR for determining gluten in gluten-free bread. Therefore, the objective of this study was to develop an FTIR spectroscopy methodology that utilizes supervised machine learning algorithms to detect and quantify wheat flour contamination in gluten-free cornbread. Advanced machine learning algorithms that include ensemble learning were explored in this study.

2. Materials and Methods

2.1. Ingredients for Breadmaking

Corn flour (CF) and wheat flour (WF) used in this study were purchased from Bob’s Red Mill Natural Foods (Milwaukie, OR, USA). The formulation used with the basic bread ingredient was adopted from [31]. The cornbread formulations included corn flour (100%) and the following ingredients were added based on the weight of the corn flour: water (70%), dried yeast (2%), salt (2%), sugar (2%), and vegetable fat (3%), and 0–10% wheat flour was used for contamination at 0.5% increment up to 1.5% and then at 1% increments afterwards until the 10% contamination level, which resulted in 13 different formulations.

2.2. Cornbread Baking

The CF was mixed with the aforementioned formulated ingredients at the WF contamination levels. A kitchen mixer (KitchenAid, Model KV25G0X, Benton Harbor, MI, USA) was used for mixing bread dough at two speed levels. The first level had a 60 rpm speed and the mixing was performed for 1 min. The second level had a speed of 95 rpm and a mixing time of 6 min. The dough was scrapped every 2 min. The dough was poured into aluminum baking pans and proofed for 35 min at 40 °C, and then baked for 1 h at 190.6 °C in a convection oven (HR202, Hobart, Troy, OH, USA). This was conducted as a result of the batter-like system of the dough due to its consistency. The baked bread loaves were kept for 1 h at room temperature (24 °C) to cool, and then ground to a fine particle size (<150 µm) using a commercial laboratory blender (Waring Commercial 7010BU Lab Blender) for 40 s to obtain a homogenous mixture. In the end, 13 different bread samples (20 g each) were obtained based on the WF addition levels.

2.3. FTIR Spectra Data Pre-Processing

FTIR spectra measurements were carried out on the ground cornbread samples using Attenuated Total Reflectance Fourier Transform Infrared (ATR-FTIR) spectrometer (Nicolet iS50, ThermoFisher Scientific, Waltham, MA, USA). The spectra were acquired in the range of 4000–450 cm⁻¹ with a spectral resolution of 4 cm⁻¹ comprising 887 wavenumbers. Each sample was scanned 32 times and the average spectrum was recorded. The data obtained were then pre-processed using the standard scaling (SC) method or standardization [32]. The SC is obtained by subtracting the mean of the feature vectors (

μ)

from every data point (X) and then dividing each column by the corresponding element in the vector’s standard deviation (

σ

). Generally, standard scaling makes the data increasingly interpretable because the normal estimation of Y when

x

(the mean or centered X) is zero represents the expected value of Y when X is at its mean with a standard deviation of 1. This transforms the data to have a resulting distribution of a mean of 0 and a standard deviation of 1 [33].

x_{ij} = \frac{(X_{ij} - μ_{ij})}{σ_{ij}}

(1)

For the classification and prediction approaches, the data were split into a training set (70%) and a test set (30%).

2.4. Development of Machine Learning Models

Classification and regression models were developed using scikit learn 0.22.2 (machine learning module in Python) which is an open-source robust library that provides a range of Python-based supervised and unsupervised learning algorithms with the capability to deploy machine learning models from prototypes to production systems.

2.4.1. Feature Engineering

The spectra data obtained from the ATR-FTIR is considered a high-dimensional dataset where the number of features is relatively larger than the number of samples. Such data also contain redundant features, which increases the propensity of overfitting the model. Therefore, feature reduction was implemented using principal component analysis (PCA). PCA as a dimensional reduction algorithm is based on mapping the original domain of the features into a new set of uncorrelated features called principal components (PCs). The number of PCs is then selected based on the desired maximum amount of variance explained which was chosen to be 100% of the total variance [33,34].

2.4.2. Classification Analysis

To detect whether a bread sample is contaminated with wheat flour during the baking process, a classification model was developed by training different individual classifiers and applying an ensemble learning technique. The ensemble learning method incorporates a combination of different learning algorithms, usually weak, to obtain a higher-performance meta-model. Experimental evidence indicates that ensemble learning often yields more accurate classification than a single learning algorithm [35]. In this study, majority voting-based ensemble learning was utilized. Different supervised learning algorithms were stacked including random forest (RF), support vector machine (SVM), and k-nearest neighbor (KNN). Each individual classifier model was trained using 70% of the dataset and then made a classification (vote) on the test (30% of the dataset) instances. The sample was assigned to the class that received more than half of the votes. This was similar to the method used by Bouziane, Messabih [36] to predict protein secondary structure and was found to outperform individual classifiers. The model was evaluated based on the confusion matrix parameters obtained with emphasis on the true positive rate (TPR), false negative rate (FNR), and the F1 score [33]. These parameters are defined as:

TPR = \frac{TP}{(TP + FN)}

(2)

FNR = \frac{FN}{(TP + FN)}

(3)

F 1 score = 2 \frac{P * R}{P + R}

(4)

where TP = true positive, FN = false positive, P = precision, and R = recall.

P = \frac{T P}{(T P + F P)}

(5)

TNR = \frac{T N}{(T N + F P)}

(6)

FPR = \frac{F N}{(T P + F N)}

(7)

2.4.3. Prediction of Gluten Level in Contaminated Cornbread

Prediction models were developed using the ensemble learning concept mentioned in Section 2.4.2 but for the regression analysis. Several individual regression models were merged, and the best-performing model was selected as in Section 2.4.2 using regression metrics stated later in this section. The supervised learning models for regression used in this project included k-nearest neighbors (KNN) regressor, random forest (RF) regressor, decision tree (dct) regressor, SVM regressor, and partial least square regressor (PLSR) [33]. The efficiency of prediction models was evaluated by assessing the coefficient of determination and root mean square error for the training set (R²_T, RMSET), and the test or prediction set (R²_P, RMSEP). Thus, the best training model was characterized by higher R²_T and the lower root means square error RMSET. Cross-validation was performed to tune and determine the optimal hyper-parameter values of the training models, and their learning curves were obtained [33]. This was to ensure that the models were not under- or overfitting the data. The optimized training model was then applied to the test or prediction set, and the values of R²_P and RMSEP were reported.

2.5. ELISA Test to Determine Gluten Level in Bread Samples

A RIDASCREEN^® Gliadin (R7001) ELISA test kit (AOAC international approved) from R-Biopharm (Darmstadt, Germany) was used for the ELISA analysis. The detection limit of the kit is 0.5 ppm gliadin or 1 ppm gluten based on the matrix and a quantification limit of 2.5 ppm gliadin or 5 ppm gluten. The method described by Okeke [33] for gluten detection using the ELISA kit was followed.

Approximately 0.5 g of each sample was placed into a 50 mL centrifuge tube and incubated for 40 min at 50 °C by adding 2.5 mL of the cocktail solution. Then, 80% ethanol (7.5 mL) was added and mixed for 60 min to extract gluten. The samples were centrifuged for 10 min at 2500× g at 25 °C. Next, 100 μL of each blank, standard, and sample solution was added into the wells and incubated for 30 min at room temperature, after which, the standard and the sample solutions were removed from the wells. All wells were washed with a wash buffer three times. After that, 100 μL of the conjugate was added to the wells and incubated for 30 min. The conjugate was then removed, and the wells were washed. Following this, 50 μL of chromogen and 50 μL of the substrate were added to individual wells and incubated for 30 min in the dark. Finally, 100 μL of the stop solution was added to measure the absorbance at 450 nm.

3. Results and Discussion

3.1. FTIR Spectra Characteristics of the Ground Cornbread Samples

An example of a spectrum of pure corn flour is shown in Figure 1A. Figure 1B,C display the FTIR spectra of the samples contaminated with 0.5% wheat flour when it is in raw form (flour), 1B, and after being baked (bread), 1C. It is noted that there were evident differences in the absorbance values when comparing the figures at the region of 1860–1480 cm⁻¹, which includes most of the amide I (1690–1600 cm⁻¹) and amide II (1580–1480 cm⁻¹) characteristic bands that are sensitive to the secondary structure content of proteins. These regions maintain a smooth formation with unique peaks due to CO- and NH+ bonds or other potential (CC and CN) stretching vibrations in Figure 1B [37]. However, in Figure 1C, the spectra have noisy deformation within the aforementioned regions, and this may be due to protein denaturation due to heat application during the baking process and other conversion processes such as mixing with other ingredients (e.g., salt) [38]. Other differences could be seen in the intensities of the peak and trough. Figure 1D shows the spectra of baked cornbread contaminated with different levels of wheat flour (0.5–10%), which is similar to the baked cornbread spectra in Figure 1C.

3.2. Classification Analyses Results

The application of PCA reduced the number of features from 887 to 20 principal components (PCs), which explained all (100%) of the variance in the data (Figure 2). The reduced features (i.e., PCs) were utilized to develop a binary classification model that included two classes: class 1 (pure cornbread) and class 2 (contaminated cornbread with wheat). Among all the classifier methods used including RF classifier, SVM classifier, KNN classifier, and majority voting-based ensemble learning (that involves stacking the individual learning algorithms), the ensemble method outperformed individual classifiers, which is clearly illustrated by the confusion matrix applied to assess the predictive capabilities of classification models [33,39,40].

Table 1 and Table 2 present the confusion matrix and its parameters obtained for the training dataset (70% of the dataset). The false negative rate (0), true positive rate (1), and F1-score (1) values obtained indicate a perfect performance of the model to identify the contaminated samples. Table 3 and Table 4 present the confusion matrix parameters obtained after the model was applied to the separate test data that contributed to 30% of all the samples. The model was able to accurately classify samples belonging to each of the classes with TPR, FNR, and F1 scores of 1, 0, and 1, respectively [33]. One possible reason for the perfect classification for both the training and test set data is the fewer classes (two) in this study. However, the result shows the ability of the ensemble classifiers to learn every feature in the binary classes of the samples used; additionally, when subjected to the test set data, the performance was reliable and consistent. There is empirical evidence in the literature that ensemble learning methods adjust for the deficiencies of individual classifiers delivering improved and more accurate results in most of the cases they are applied [41]. Kim et al. [42] reported that the ensemble learning improved the performance of machine learning in predicting the sugar content in various citrus species. Additionally, Oztuk et al. [43] reported that the application of the voting-based ensemble learning method improved the detection of 19 classes of grain flours studied under rotational motion.

3.3. Prediction of Wheat Level in Cornbread

Table 5 presents the evaluation parameters for the predictive learning algorithm used including RF regressor, KNN regressor, decision trees, SVM regressor, PLSR, and ensemble learning. The results for KNN and PLSR are very close in performance with vales of R²_T = 0.98 (KNN), 0.99 (PLSR), R²_P = 0.99 (KNN), 0.97 (PLSR), and RMSET = 0.41 (KNN), 0.08 (PLSR), and RMSEP = 0.34 (KNN), 0.52 (PLSR), respectively [33]. The results indicate that both learning algorithms have the potential to quantify the level of the wheat flour contaminant in the cornbread samples within the percentage levels used. It is worth stating that KNN regressor resulted in lower error values for the prediction set compared with the PLSR model. Thus, based on both the values of R²_P (0.99) and RMSEP (0.34), KNN was selected as the best-performing learning algorithm for predicting wheat flour contamination levels in baked cornbread. Figure 3A–E show the learning curves obtained as a function of the number of the hyper-parameters tuned for the individual algorithms [33].

The aim of such a function was to choose the optimal hyper-parameters that yield a balance between the bias and variance in other to prevent overfitting [33]. As seen in Figure 3 for the best learning algorithm (KNN), the learning gap between the scores measured on the training and the cross-validation set is minimized and insignificant. It was observed that as the number of neighbors in the KNN algorithm increases, the changes remain constant until approaching the value of six where the scores started dropping leading to a lower prediction accuracy of the model [33]. Therefore, we can conclude from this that the number of neighbors ranging from one to five is more effective for the KNN model to predict or quantify the percentage contamination of the wheat flour in the cornbread [33].

The learning gap between the scores measured on the training and cross-validation set reached the minimum or optimum level. It was observed that as we increase the number of neighbors for the algorithm, the changes remain constant until approaching the value of six, where the scores started dropping, leading to a lower accuracy of the model. Therefore, we can conclude that the number of neighbors ranging from one to five is more effective for the KNN model to predict or quantify the percentage contamination of the wheat flour in the cornbread [33]. Cui et al. [44] followed a similar approach for optimizing the hyperparameters (i.e., gagging fraction and frequency) for the gradient-boosting decision tree ensemble learning to enhance regression models based on FTIR data to predict the decontamination rate of cold plasma-treated Chitosan DNA films.

Table 6 provides the results of the ELISA test for the 12 wheat-contaminated cornbread samples. There is a clear indication and validation that gluten is detectable in the bread even after baking, though the structure of the protein has been disrupted due to heat treatment during baking, and the spectra formation that matched the actual measurements based on the inclusion level were correct. The threshold for gluten content was observed to be at 3.5% wheat addition level (Table 6), which indicated gluten content below the 20 ppm for a product to be considered gluten-contaminated.

4. Conclusions

FTIR spectroscopy has always played an important role in the food industry with regard to food safety inspection and quality assessment. In this study, we used FTIR spectroscopy coupled with supervised machine learning approaches to detect and quantify wheat flour (gluten) contamination in the range of 0.5–10% in gluten-free cornbread that falls within the threshold of 20 ppm of containment that elicits an allergenic reaction in wheat/gluten sensitive individual. This study shows that the ensemble learning method performed best than the rest of the individual supervised ML algorithms in detecting wheat/gluten contaminants in gluten-free cornbread. The KNN regressor emerged as the most promising technique in quantifying the level of WF contamination with a prediction coefficient of determination and root mean squared error of 0.99 and 0.34, respectively. Therefore, the results obtained from this study indicate the potential and the effectiveness of FTIR spectroscopy with supervised learning techniques for rapid authentication of cornbread contaminated with what flour. More generally, the study can indeed help develop a protocol for quality assurance in bakery product factories. This helps to reduce cross-contamination in the baking industry and consequently provides less food waste for the currently critical food supply chain that dramatically suffers from high prices of energy, a lack of water resources, domestic disputes, and migration. A future research consideration is the development of a handheld system and/or a mobile phone application that would allow stakeholders to obtain a fast and accurate decision about their foods by analyzing the FTIR spectra, which requires minimal skills of machine learning and leads to the possibility of running a quick test for wheat/gluten allergen contamination. Moreover, future research should consider testing other glycoproteins (milk, nuts, egg, fish, and soy).

Author Contributions

Conceptualization, A.A.A.; methodology, A.A.A., A.O. and A.M.R.; software, A.O. and A.M.R.; validation, A.A.A.; formal analysis, A.O.; investigation, A.O. and A.A.A.; resources, A.A.A.; data curation, A.O.; and University of Kentucky; writing—original draft preparation, A.O. and A.A.A.; writing—review and editing, A.A.A. and A.M.R.; visualization, A.O.; supervision, A.A.A. and A.M.R.; project administration, A.A.A.; funding acquisition, A.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Institute of Food and Agriculture (NIFA) U.S.D.A. Multistate Project #: 1024529.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used in this project belong to the United States government and the administering institution, the University of Kentucky (UK), and can be requested through the UK Library.

Acknowledgments

The authors would like to acknowledge the Kentucky Agricultural Experiment Station for supporting and sponsoring this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gendel, S.M. Comparison of international food allergen labeling regulations. Regul. Toxicol. Pharmacol. 2012, 63, 279–285. [Google Scholar] [CrossRef]
Croote, D.; Quake, S.R. Food allergen detection by mass spectrometry: The role of systems biology. NPJ Syst. Biol. Appl. 2016, 2, 16022. [Google Scholar] [CrossRef] [PubMed]
Alberto, R.-T.; Murray, J.A. Gluten-sensitive enteropathy. In Food Allergy: Adverse Reactions to Foods and Food Additives; Metcalfe, D.D., Sampson, H.A., Simon, R.A., Eds.; John Wiley & Sons: Oxford, UK, 2013; pp. 217–229. [Google Scholar]
Leonard, M.M.; Sapone, A.; Catassi, C.; Fasano, A. Celiac disease and nonceliac gluten sensitivity: A review. JAMA 2017, 318, 647–656. [Google Scholar] [CrossRef] [PubMed]
Allred, L.K.; Ritter, B.W. Recognition of gliadin and glutenin fractions in four commercial gluten assays. J. AOAC Int. 2010, 93, 190–196. [Google Scholar] [CrossRef]
Wieser, H.; Segura, V.; Ruiz-Carnicer, Á.; Sousa, C.; Comino, I. Food safety and cross-contamination of gluten-free products: A narrative review. Nutrients 2021, 13, 2244. [Google Scholar] [CrossRef]
Taylor, S.L.; Baumert, J.L. Cross-contamination of foods and implications for food allergic patients. Curr. Allergy Asthma Rep. 2010, 10, 265–270. [Google Scholar] [CrossRef] [PubMed]
Osorio, C.E.; Mejías, J.H.; Rustgi, S. Gluten detection methods and their critical role in assuring safe diets for celiac patients. Nutrients 2019, 11, 2920. [Google Scholar] [CrossRef]
Research, G.V. Gluten-free Bakery Market Size, Share & Trends Analysis Report By Product (Biscuits & Cookies, Bread, Cakes), By Distribution Channel (Online, Supermarkets & Hypermarkets), By Region, And Segment Forecasts, 2022–2030. Available online: https://www.grandviewresearch.com/industry-analysis/gluten-free-bakery-market-report#:~:text=North%20America%20dominated%20the%20gluten,due%20to%20rising%20health%20concerns (accessed on 3 February 2023).
José Blasco, E.M.G.; Da-Wen Sun, A.C.Z. Vision Systems. In Optical Monitoring of Fresh and Processed Agricultural Crops; Zude, M., Ed.; CRC Press: Boca Raton, FL, USA, 2008. [Google Scholar]
Sørensen, L. Application of reflectance near infrared spectroscopy for bread analyses. Food Chem. 2009, 113, 1318–1322. [Google Scholar] [CrossRef]
Albanell, E.; Miñarro, B.; Carrasco, N. Detection of low-level gluten content in flour and batter by near infrared reflectance spectroscopy (NIRS). J. Cereal Sci. 2012, 56, 490–495. [Google Scholar] [CrossRef]
Ahmad, M.H.; Nache, M.; Hitzmann, B. Potential of fluorescence spectroscopy in detection of low-levels of gluten in flour: A preliminary study. Food Control 2017, 73, 401–405. [Google Scholar] [CrossRef]
Chen, J.; Zhu, S.; Zhao, G. Rapid determination of total protein and wet gluten in commercial wheat flour using siSVR-NIR. Food Chem. 2017, 221, 1939–1946. [Google Scholar] [CrossRef] [PubMed]
Wenning, M.; Scherer, S. Identification of microorganisms by FTIR spectroscopy: Perspectives and limitations of the method. Appl. Microbiol. Biotechnol. 2013, 97, 7111–7120. [Google Scholar] [CrossRef] [PubMed]
Rodriguez-Saona, L.E.; Allendorf, M.E. Use of FTIR for rapid authentication and detection of adulteration of food. Annu. Rev. Food Sci. Technol. 2011, 2, 467–483. [Google Scholar] [CrossRef] [PubMed]
Armstrong, P.; Maghirang, E.; Xie, F.; Dowell, F. Comparison of dispersive and Fourier-transform NIR instruments for measuring grain and flour attributes. Appl. Eng. Agric. 2006, 22, 453–457. [Google Scholar] [CrossRef]
Czaja, T.; Mazurek, S.; Szostak, R. Quantification of gluten in wheat flour by FT-Raman spectroscopy. Food Chem. 2016, 211, 560–563. [Google Scholar] [CrossRef]
Chitra, J.; Ghosh, M.; Mishra, H. Rapid quantification of cholesterol in dairy powders using Fourier transform near infrared spectroscopy and chemometrics. Food Control 2017, 78, 342–349. [Google Scholar] [CrossRef]
Wu, D.; Nie, P.; He, Y.; Bao, Y. Determination of calcium content in powdered milk using near and mid-infrared spectroscopy with variable selection and chemometrics. Food Bioprocess Technol. 2012, 5, 1402–1410. [Google Scholar] [CrossRef]
Lei, Y.; Zhou, Q.; Zhang, Y.-L.; Chen, J.-B.; Sun, S.-Q.; Noda, I. Analysis of crystallized lactose in milk powder by Fourier-transform infrared spectroscopy combined with two-dimensional correlation infrared spectroscopy. J. Mol. Struct. 2010, 974, 88–93. [Google Scholar] [CrossRef]
Wu, X.; Zhu, J.; Wu, B.; Sun, J.; Dai, C. Discrimination of tea varieties using FTIR spectroscopy and allied Gustafson-Kessel clustering. Comput. Electron. Agric. 2018, 147, 64–69. [Google Scholar] [CrossRef]
Ikeda, T.; Kanaya, S.; Yonetani, T.; Kobayashi, A.; Fukusaki, E. Prediction of Japanese green tea ranking by Fourier transform near-infrared reflectance spectroscopy. J. Agric. Food Chem. 2007, 55, 9908–9912. [Google Scholar] [CrossRef]
Ren, G.; Wang, S.; Ning, J.; Xu, R.; Wang, Y.; Xing, Z.; Wan, X.; Zhang, Z. Quantitative analysis and geographical traceability of black tea using Fourier transform near-infrared spectroscopy (FT-NIRS). Food Res. Int. 2013, 53, 822–826. [Google Scholar] [CrossRef]
Li, X.; Zhang, Y.; He, Y. Rapid detection of talcum powder in tea using FT-IR spectroscopy coupled with chemometrics. Sci. Rep. 2016, 6, 30313. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Sun, C.; Luo, L.; He, Y. Determination of tea polyphenols content by infrared spectroscopy coupled with iPLS and random frog techniques. Comput. Electron. Agric. 2015, 112, 28–35. [Google Scholar] [CrossRef]
Batista, N.N.; de Andrade, D.P.; Ramos, C.L.; Dias, D.R.; Schwan, R.F. Antioxidant capacity of cocoa beans and chocolate assessed by FTIR. Food Res. Int. 2016, 90, 313–319. [Google Scholar] [CrossRef] [PubMed]
Teye, E.; Huang, X.-Y.; Lei, W.; Dai, H. Feasibility study on the use of Fourier transform near-infrared spectroscopy together with chemometrics to discriminate and quantify adulteration in cocoa beans. Food Res. Int. 2014, 55, 288–293. [Google Scholar] [CrossRef]
Reis, N.; Franca, A.S.; Oliveira, L.S. Performance of diffuse reflectance infrared Fourier transform spectroscopy and chemometrics for detection of multiple adulterants in roasted and ground coffee. LWT-Food Sci. Technol. 2013, 53, 395–401. [Google Scholar] [CrossRef]
Craig, A.P.; Franca, A.S.; Oliveira, L.S.; Irudayaraj, J.; Ileleji, K. Fourier transform infrared spectroscopy and near infrared spectroscopy for the quantification of defects in roasted coffees. Talanta 2015, 134, 379–386. [Google Scholar] [CrossRef]
Mondal, A.; Datta, A. Bread baking—A review. J. Food Eng. 2008, 86, 465–474. [Google Scholar] [CrossRef]
Varmuza, K.; Filzmoser, P. Introduction to Multivariate Statistical Analysis in Chemometrics; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar] [CrossRef]
Okeke, A. Fourier Transform Infrared Spectroscopy (As A Rapid Method) Coupled With Machine Learning Approaches For Detection And Quantification Of Gluten Contaminations In Grain-Based Foods; University of Kentucky: Lexington, KY, USA, 2020. [Google Scholar]
Howley, T.; Madden, M.G.; O’Connell, M.-L.; Ryder, A.G. The effect of principal component analysis on machine learning accuracy with high dimensional spectral data. In Proceedings of the 25th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence—Applications and Innovations in Intelligent Systems XIII, Cambridge, UK, 12–14 December 2005; pp. 209–222. [Google Scholar]
Dietterich, T.G. Ensemble learning. Handb. Brain Theory Neural Netw. 2002, 2, 110–125. [Google Scholar]
Bouziane, H.; Messabih, B.; Chouarfia, A. Profiles and majority voting-based ensemble method for protein secondary structure prediction. Evol. Bioinform. 2011, 7, EBO. S7931. [Google Scholar] [CrossRef]
Surewicz, W.K.; Mantsch, H.H.; Chapman, D. Determination of protein secondary structure by Fourier transform infrared spectroscopy: A critical assessment. Biochemistry 1993, 32, 389–394. [Google Scholar] [CrossRef]
Neill, G.; Ala’a, H.; Magee, T. Optimisation of time/temperature treatment, for heat treated soft wheat flour. J. Food Eng. 2012, 113, 422–426. [Google Scholar] [CrossRef]
Jeong, S.; Lee, D.; Yang, G.; Kwon, H.; Kim, M.; Lee, S. Unravelling the physicochemical features of US wheat flours over the past two decades by machine learning analysis. LWT 2022, 169, 114036. [Google Scholar] [CrossRef]
Marom, N.D.; Rokach, L.; Shmilovici, A. Using the confusion matrix for improving ensemble classifiers. In Proceedings of the 2010 IEEE 26th Convention of Electrical and Electronics Engineers in Israel, Eilat, Israel, 17–20 November 2010; pp. 000555–000559. [Google Scholar]
Olenskyj, A.G.; Donis-González, I.R.; Earles, J.M.; Bornhorst, G.M. End-to-end prediction of uniaxial compression profiles of apples during in vitro digestion using time-series micro-computed tomography and deep learning. J. Food Eng. 2022, 325, 111014. [Google Scholar] [CrossRef]
Kim, S.-Y.; Hong, S.-J.; Kim, E.; Lee, C.-H.; Kim, G. Application of ensemble neural-network method to integrated sugar content prediction model for citrus fruit using Vis/NIR spectroscopy. J. Food Eng. 2023, 338, 111254. [Google Scholar] [CrossRef]
Ozturk, S.; Bowler, A.; Rady, A.; Watson, N.J. Near-infrared spectroscopy and machine learning for classification of food powders during a continuous process. J. Food Eng. 2023, 341, 111339. [Google Scholar] [CrossRef]
Cui, H.; Wang, Q.; Rai, R.; Salvi, D.; Nitin, N. DNA-based surrogates for the validation of microbial inactivation using cold atmospheric pressure plasma and plasma-activated water processing. J. Food Eng. 2023, 339, 111267. [Google Scholar] [CrossRef]

Figure 1. Examples of (A) FTIR-spectra of pure corn flour; (B) FTIR-spectra of the corn flour contaminated with 0.5% wheat flour (raw sample); (C) FTIR-spectra of baked cornbread contaminated with 0.5% wheat flour; (D) FTIR-spectra of baked cornbread mixed with different levels of wheat flour (0.5–10% at 0.5% increment).

Figure 2. A plot of the number of principal components (PCs) as a function of the variance explained in the samples.

Figure 3. Validation curve for decision tree regressor (A); KNN (B); PLSR (C); random forest regressor (D); and support vector machine (E).

Table 1. Confusion matrix parameters for the majority voting-based learning classification training model.

Class\|Metric	TPR	FPR	TNR	FNR	Err	P	F1_Score
Class 1	1.0	0.0	1.0	0.0	0.0	1.0	1.0
Class 2	1.0	0.0	1.0	0.0	0.0	1.0	1.0

TPR: true positive rate, FPR: false positive rate, TNR: true negative rate, FNR: false negative rate, Err: error, P: precision, F1: scores for a measure of accuracy. Class 1: no contamination, Class 2: contaminated with wheat.

Table 2. Confusion matrix for the majority voting-based ensemble learning. classification training model.

Actual Class	Class 1	Class 2
Classified as Class 1	138	0
Classified as Class 2	0	142
Classified as Unassigned	0	0

Class 1: no contamination, Class 2: contaminated with wheat.

Table 3. Confusion matrix parameters for the classification test model.

Class\|Metric	TPR	FPR	TNR	FNR	Err	P	F1_Score
Class 1	1.0	0.0	1.0	0.0	0.0	1.0	1.0
Class 2	1.0	0.0	1.0	0.0	0.0	1.0	1.0

TPR: true positive rate, FPR: false positive rate, TNR: true negative rate, FNR: false negative rate, Err: error, P: precision, F1: scores for a measure of accuracy. Class 1: no contamination, Class 2: contaminated with wheat.

Table 4. Confusion matrix for the majority voting-based ensemble learning classification test model.

Actual Class	Class 1	Class 2
Classified as Class 1	62	0
Classified as Class 2	0	58
Classified as Unassigned	0	0

Class 1: no contamination, Class 2: contaminated with wheat.

Table 5. Prediction analysis on a different learning algorithm.

Learning Algorithm	Hyper-Parameters	Training		Prediction
		R²_T	RMSET	R²_P	RMSEP
Random Forest (rf)	n_estimators = 991	1.00	0.00	0.516	2.064
K-Nearest Neighbors (KNN)	neighbors = 4, metric = ‘manhattan’	0.98	0.41	0.99	0.34
Decision Tree (dct)	max_depth = 6	0.99	0.29	0.57	1.93
Support Vector Machine (svr)	gamma = 0.03	0.90	0.98	0.75	1.47
Partial Least Square Regression (PLSR)	n_components = 30	0.99	0.08	0.97	0.52
Ensemble Method (voting)	(rf, KNN, dct, svr, wt = none)	0.99	1.29	0.81	1.29

Wt: weight, n_estimators: no. of estimators, n_components: no. of components, R²_T: R² of training, R²_P: R² of prediction, RMSET: root mean square error of training, RMSEP: root mean square error of prediction.

Table 6. ELISA test results for gluten quantification in the cornbread samples contaminated with wheat flour (WF).

Selected WF Contaminated Bread Samples Level (%)	Gluten Level (ppm)	Label
0	0	Gluten-free
0.5	3.04	Gluten-free
1	4.89	Gluten-free
1.5	10.54	Gluten-free
2.5	14.81	Gluten-free
3.5	19.84	Gluten-free
4.5	***	***
5.5	38.20	Gluten-contaminated
6.5	-	-
7.5	40.96	Gluten-contaminated
8.5	-	-
9.5	47.43	Gluten-contaminated
10	60.75	Gluten-contaminated

*** Indicates threshold for higher values than 20 ppm. - missing data due to error.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Adedeji, A.A.; Okeke, A.; Rady, A.M. Utilization of FTIR and Machine Learning for Evaluating Gluten-Free Bread Contaminated with Wheat Flour. Sustainability 2023, 15, 8742. https://doi.org/10.3390/su15118742

AMA Style

Adedeji AA, Okeke A, Rady AM. Utilization of FTIR and Machine Learning for Evaluating Gluten-Free Bread Contaminated with Wheat Flour. Sustainability. 2023; 15(11):8742. https://doi.org/10.3390/su15118742

Chicago/Turabian Style

Adedeji, Akinbode A., Abuchi Okeke, and Ahmed M. Rady. 2023. "Utilization of FTIR and Machine Learning for Evaluating Gluten-Free Bread Contaminated with Wheat Flour" Sustainability 15, no. 11: 8742. https://doi.org/10.3390/su15118742

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Utilization of FTIR and Machine Learning for Evaluating Gluten-Free Bread Contaminated with Wheat Flour

Abstract

1. Introduction

2. Materials and Methods

2.1. Ingredients for Breadmaking

2.2. Cornbread Baking

2.3. FTIR Spectra Data Pre-Processing

2.4. Development of Machine Learning Models

2.4.1. Feature Engineering

2.4.2. Classification Analysis

2.4.3. Prediction of Gluten Level in Contaminated Cornbread

2.5. ELISA Test to Determine Gluten Level in Bread Samples

3. Results and Discussion

3.1. FTIR Spectra Characteristics of the Ground Cornbread Samples

3.2. Classification Analyses Results

3.3. Prediction of Wheat Level in Cornbread

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI