Enhancing Shelf Life Prediction of Fresh Pizza with Regression Models and Low Cost Sensors

Wunderlich, Paul; Pauli, Daniel; Neumaier, Michael; Wisser, Stephanie; Danneel, Hans-Jürgen; Lohweg, Volker; Dörksen, Helene

doi:10.3390/foods12061347

Open AccessArticle

Enhancing Shelf Life Prediction of Fresh Pizza with Regression Models and Low Cost Sensors

¹

inIT–Institute Industrial IT, OWL University of Applied Sciences and Arts, 32657 Lemgo, Germany

²

Institute for Life Science Technologies (ILT.NRW), OWL University of Applied Sciences and Arts, 32657 Lemgo, Germany

^*

Author to whom correspondence should be addressed.

Foods 2023, 12(6), 1347; https://doi.org/10.3390/foods12061347

Submission received: 1 March 2023 / Revised: 15 March 2023 / Accepted: 20 March 2023 / Published: 22 March 2023

(This article belongs to the Topic Recent Advances and Insights in Storage, Spoilage and Shelf Life of Foods)

Download

Browse Figures

Versions Notes

Abstract

:

The waste of food presents a challenge for achieving a sustainable world. In Germany alone, over 10 million tonnes of food are discarded annually, with a worldwide total exceeding 1.3 billion tonnes. A significant contributor to this issue are consumers throwing away still edible food due to the expiration of its best-before date. Best-before dates currently include large safety margins, but more precise and cost effective prediction techniques are required. To address this challenge, research was conducted on low-cost sensors and machine learning techniques were developed to predict the spoilage of fresh pizza. The findings indicate that combining a gas sensor, such as volatile organic compounds or carbon dioxide, with a random forest or extreme gradient boosting regressor can accurately predict the day of spoilage. This provides a more accurate and cost-efficient alternative to current best-before date determination methods, reducing food waste, saving resources, and improving food safety by reducing the risk of consumers consuming spoiled food.

Keywords:

spoilage prediction; regression models; low-cost sensors; machine learning; sustainability; food waste

1. Introduction

A major concern of the global Sustainable Development Goals (SDGs) is to counteract food wastage (Goal 12.3) [1]. In Germany alone, about 10 million tonnes of edible food are thrown away every year for a variety of reasons [2]. Worldwide, about 1.3 billion tonnes of food are lost or wasted [3]. A decisive share of 59% (6.5 million tonnes) is generated in private households. There in turn, 30% of food is thrown away due to an expired best-before date (BBD) [4].

The best-before date assures the end consumer that the food will retain its specific properties (e.g., taste, colour and consistency) under appropriate storage conditions and can still be consumed without hesitation, taking into account all influences along the food supply chain [5]. Based on various storage tests and quality controls, manufacturers establish a time safety buffer, which is intended to mitigate potentially less than ideal handling during transport or out-of-production storage. This gives the food producer in and outside Europe the certainty that the product still has the promised specific properties at the end of the best-before date [6]. However, due to the safety buffer, these food products can often still be consumed after the expiration of the best-before date. The technical processes in food processing are dependent on the individual raw material quality and on fluctuations in the course of processing. Therefore, depending on the batch, there are also individual risks of chemical or microbial spoilage, allergen or contaminant risks [7].

In an era of diminishing resource availability and in light of the European Union targets for sustainable food systems [8], there is a need for innovative but also economically viable solutions to improve best-before date prediction. The application of machine learning can help here, to reduce overall food waste by improving predictions of the best before dates listed on food packaging. A valuable contribution to food safety is also made by the mechanisms that must be integrated for this purpose such as such as data collection for real-time quality control in production.

Therefore, we came up with the following research questions.

I.: What are examples of the use of machine learning in the food industry regarding spoilage detection and what are their limitations?
II.: What types of sensors provide the most relevant and accurate data for determining the shelf life of fresh pizza?
III.: What machine learning algorithm can be effectively utilized to predict the shelf life of fresh pizza using sensor data?

The research aims to demonstrate the feasibility of incorporating low-cost sensors into food shelf-life prediction prognostics through a use case involving fresh pizza stored in a refrigerator. The complex food matrix of pizza provides a suitable case for evaluating the performance of low-cost sensors such as gas sensors, ethanol, pH and near-infrared spectroscopy. A prediction concept is developed based on the collected sensor data and machine learning models are applied to predict the shelf-life of the fresh pizza. The models are evaluated utilizing established machine learning metrics to determine their efficacy and accuracy.

2. Relevant Work

This chapter provides an overview of the state of spoilage in ready-to-eat pizza and the integration of machine learning in the food industry. The latest research findings and technological innovations are presented, with a focus on the use of machine learning for enhancing spoilage detection and prevention in pizza and other food products.

Most studies in the field of food technology in relation to ready-to-eat pizza are aimed at the course of the spoilage process and whether a food is still edible. Singh, Wani and Goyal [9,10,11] deal with the effects of different modified atmospheres during refrigerated storage on the sensory, microbiological and chemical properties, as well as quality and shelf life of a homemade vegetarian ready-to-bake pizza. A shelf life study and environmental monitoring of pizza base with tomato puree under modified atmospheres was investigated by Fasano and Gallo [12]. In this study, the pizza base with tomato puree was examined microbiologically and for gas concentration changes. Their spoilage processes were also investigated for other components of a ready-made pizza, such as mozzarella cheese. Alves et al. [13] conducted a study on the sensory properties of letter-cut mozzarella cheese under different modified atmospheres, including its odour, taste and overall quality. Alam and Goyal [14] investigated the effects of different packaging materials and modified atmospheres on the microbiological quality of home-made mozzarella cheese during refrigerated storage. Likewise, different types of vegetables have also been studied, such as courgettes or peppers. This work primarily looks at how certain packaging materials or packaging strategies affect shelf life. In particular, sensory and chemical properties as well as gas concentration were analysed. The examination of courgettes by Lucera et al. [15] also included additional analysis of microbiological contamination. The testing of the green bell peppers by Manolopoulou et al. [16] included the measurement of colour indices. Oliveira et al. [17] dealt with the influence of temperature and the number of foil perforations for sliced mushrooms. They evaluated the quality of the mushrooms and developed a kinetic shelf-life model for modified atmosphere packaging. Investigations into the spoilage process of a ready-to-eat pizza and its components are currently mostly limited to basic investigations in the laboratory. However, these laboratory analyses are time-consuming and very cost-intensive. Therefore, use of machine learning (ML) in the food industry has gained increasing popularity in recent years, as it offers a non-destructive and efficient way to evaluate the quality and safety of food products. ML techniques have been applied to various aspects of food evaluation, including classification of food products based on colour, texture, and chemical properties, as well as the detection of defects and contamination. For instance, Ireri et al. [18] employed support vector machines (SVM) to classify tomatoes based on their colour and texture features, with a focus on detecting defects and stains on their surface. Kanade et al. [19] used the K-Nearest Neighbour (KNN) algorithm to classify guava fruits. Liang et al. [20] proposed a separate fruit tray system and a deep learning-based method to detect and classify high-quality apples in real-time. Basak et al. [21] developed a non-destructive method using ML algorithms to predict the total soluble solids (TSS) and pH of strawberries. Image processing techniques have also been used in several studies to assess food quality. For example, Kumar et al. [22] utilized ML systems to evaluate the quality of pomegranate fruits, while Ropelewska et al. [23] distinguished fresh and lacto-fermented red bell pepper samples using image texture analysis and the KNN algorithm. Other researchers have employed more advanced technologies such as neural networks (NN). Basile et al. [24] used non-destructive NIR spectroscopy and ML to predict texture parameters and TSS content in intact berries. Xiong et al. [25] proposed a transfer-learning-based model using a 3D-printed electronic nose and deep learning to detect the freshness of chicken breasts. Kim et al. [26] proposed a deep learning-based Haugh unit (HU) prediction model to determine egg freshness using non-destructive weight loss measurements. The model uses a stacked convolutional neural network (CNN) and long short-term memory (LSTM) algorithm with data augmentation to improve the accuracy of HU prediction compared to traditional ML methods. Furthermore, some researchers have explored the use of ML combined with other technologies to evaluate food quality. For instance, Darwish et al. [27] proposed a novel approach combining microwave (MW) sensing technology and ML tools such as MLP and SVM to classify food products as contaminated or uncontaminated with high accuracy. Fengou et al. [28] investigated the use of FTIR spectroscopy and multispectral imaging in combination with ML algorithms to evaluate the microbiological quality of chicken burgers. Cheng et al. [29] utilized near-infrared spectroscopy and hyperspectral imaging data in a partial least squares regression to predict the chemical properties of fish muscle tissue. Faqeerzada et al. [30] investigated the use of shortwave-infrared hyperspectral imaging (SWIR-HSI) combined with the one-class classifier DD-SIMCA for high-throughput quality screening of almond powder regarding potential adulteration. Finally, Kang et al. [31] provide a comprehensive review of the current applications of machine learning and hyperspectral imaging in the food supply chain for non-destructive testing and evaluation of food quality and safety attributes. Özdoğan et al. [32] give a detailed overview of recent developments in hyperspectral imaging systems for determining sensory properties such as colour, defects, texture, taste, freshness, and ripeness in various foods. The authors note that the visible and near-infrared region is the most commonly used spectral range for sensory evaluation, and linear regression models are the most commonly used multivariate analysis techniques.

In conclusion, machine learning has shown to be a useful tool in the food industry for quickly and effectively assessing the quality and safety of food. Different sensors, including hyperspectral imaging, spectroscopy, and microwave sensing, have been combined with different machine learning techniques, such as support vector machines, K-nearest neighbors, deep learning, and neural networks, to detect flaws, classify food products, and predict chemical and sensory properties. These methods have been used on a variety of food items, including fruits, vegetables, meat, and dairy, with encouraging results in terms of spotting contaminants and determining ripeness.

3. Materials and Methods

In order to make more accurate predictions regarding the shelf-life of fresh pizzas, several prerequisites must be met. To begin, data must be gathered for a prediction model that utilizes machine learning techniques. This data should encompass various characteristics of the fresh pizza, specifically those that change during the spoilage process. These characteristics can be measured using various sensor technologies and serve as the foundation for the data. A comprehensive overview of the overall concept is depicted in Figure 1.

The methodology for predicting the shelf-life of fresh pizzas involves five steps, starting from sensor data acquisition and culminating in the development and deployment of a machine learning model and its predictions. Once data are collected by the sensors, a parser is required to convert the sensor data from the manufacturer-specific file format to an uniform file format. After that, the data must be cleaned, transformed, and prepared for modeling. These stages will be discussed in greater detail subsequently. A regression model can then be trained to make predictions. Finally, these predictions must be evaluated and the regression model may need to be fine-tuned, as required.

3.1. Measurement Setup

The first step in the process is to acquire sensory information from the pizza using a specially designed measurement setup, as illustrated in Figure 2.

The measurement setup consists of the climate-controlled cabinet Binder MKF 115 E 3.1 [33] that houses a desiccator. The cabinet is specifically configured to maintain a temperature, which is also the temperature inside the desiccator. The desiccator is a nearly airtight container that provides a secure base for attaching sensors. In addition, the desiccator has the advantage of keeping the moisture well and protecting the product from drying out like a package. This allows sensor data to be collected over a longer period of time before the pizza dries out. After the appropriate storage period, the pizza slice is removed and the desiccator is carefully cleaned. The necessary sensors (S1, S2, S3 and S4) are strategically placed within the measurement setup and are remotely operated by a control unit. Data transfer between the sensors and Windows-based control unit is achieved via a USB hub and a USB cable.

3.2. Sensors

For the storage tests, the following sensors were used in the measurement setup:

CO₂ sensor
VOC sensor
Ethanol sensor
pH sensor
NIR sensor

3.2.1. CO₂ and VOC Sensors

The SCD30 sensor [34] is a carbon dioxide (CO₂) sensor and the SVM40 sensor [35] is a volatile organic compound (VOC) sensor. Both sensors are developed by the manufacturer Sensirion, located in Stäfa, Switzerland. They are controlled via the “SEK-ControlCenter” software in our storage test. The SCD30 sensor is a non-dispersive infrared sensor (NDIR sensor) with a measurement range of 400 ppm–10,000 ppm. It takes measurements every two seconds and is connected to a sensor bridge via Ethernet cable, which is then connected to the control unit. The SVM40 sensor is built on the concept of a metal oxide semiconductor sensor and is connected to a computer via UART (Universal Asynchronous Receiver-Transmitter) and USB-C connection. The sensor is capable of measuring both the processed value and a digital raw value signal (SRAW VOC). A temperature and humidity sensor is included on both sensors.

3.2.2. Ethanol Sensor

The GDX-ETHO sensor from Vernier [36] also relies on the principle of a metal oxide semiconductor sensor and can measure the ethanol content in the vapor phase. The sensor can be connected via Bluetooth 4.2 or a USB-C connection. Depending on the configuration using Vernier’s “Graphical Analysis^®” software, the sensor measures in a predefined time interval in the unit ppm.

3.2.3. pH Sensor

The MultiLine^® Multi 3620 IDS [37] pocket multi-parameter meter from Xylem Inc. (Washington, DC, USA) was used in combination with the SenTix^® Sp-T 900 [38] pH probe electrode to measure the pH value. This combination allows for measurement in a pH range of 2 to 13 with an accuracy of

\pm 0.004

. The measurement interval and period can be defined in advance. The pH electrode, designed for penetration measurement of semi-solid foods and placed at the edge of the dough, was re-calibrated with the appropriate buffer solution before each measurement.

3.2.4. NIR Sensor

For optical measurement, a near-infrared sensor, the Tellspec Enterprise Sensor [39], was used. This sensor measures in a range of 900 to 1700 nanometers and is based on the NIR-S-G1 Sensor module from Inno Spectra [40]. It is powered by USB and controlled via the Inno Spectra Corporation NIR Scan software. The individual measurements result from an average of 50 individual measurements. The values recorded are absorption, reflection, and intensity.

3.3. Storage Tests

The storage tests were conducted using the measurement setup as described previously. In total, 12 storage tests were performed. Each storage test is a measurement series with one pizza sample. The climate chamber was set to a temperature of 5 °C and a relative humidity of 60%. The use case is a fresh pizza, which is richly topped with grilled courgette, grilled sweet pepper and mozzarella. The ingredients are wheat flour, 15% strained tomatoes, 12% grilled sweet peppers, 12% firm mozzarella cheese, water, 6.3% grilled courgettes, 3.1% grilled aubergines, 1.4% tomato concentrate, rapeseed oil, salt, baker’s yeast, extra virgin olive oil, sugar, oregano, parsley, garlic, onions, pepper, basil, fried onions. In its original state, it is a commercial frozen pizza suitable for vegetarians, which has been defrosted in the refrigerator before measurement. A representative sample of 40 g of a thawed and unopened pizza was placed in the desiccator, close to the necessary sensors. The sample was monitored and measured over a 14-day period under controlled temperature and humidity conditions. The desiccator’s temperature was approximately 7 °C and the relative humidity was approximately 70%. Measurements of NIR, pH and ethanol were taken every 20 min, while the CO₂ sensor was set to measure every 2 s, and the VOC sensor was set to measure once per second.

3.4. Parsing and Data Preprocessing

The sensors have digitally captured the fresh pizza during the storage test. However, these data are typically in a manufacturer-specific format, which can make data processing more difficult. To ensure efficient and streamlined data processing, it is essential to have a uniform data format as a foundation. To achieve this, the csv file format was utilized. Parsers were developed and implemented to convert the sensor data into the csv file format for compatibility and ease of use. Next, steps of data preprocessing are carried out. Here, outliers and faulty measurements are eliminated and any missing information is imputed if necessary. The sensor data are not yet sufficient in this form to learn a regression model. Since regression is a supervised learning process, it requires not only features (sensor data), but also labels (target values) that serve as a target for learning. To achieve this, a label was created based on the shelf life of each pizza. The day of spoilage was defined by food experts and a minimum safety buffer of 1 day was additionally applied. The determination was carried out on pizzas from the same batch by means of microbiological and human sensory tests. For creating the labels, the day of spoilage is always set to day 0. All days up to the day of spoilage are positively decreasing. This means that a label of 7 means that there are still 7 days until the day of spoilage. For days after the day of spoilage, a negative decreasing count is applied: -1 for the first day after, -2 for the second day after, and so on. This means that the regression model is trained to determine the position on the time axis in relation to the day of spoilage.

3.5. Data

The data provided to the model consists of preprocessed values from various sensors, along with the aforementioned labels. In the following, we provide an overview of what each sensor was measuring. The CO₂ sensor measures levels of carbon dioxide, temperature, and humidity. The VOC sensor measures levels of volatile organic compounds (VOCs), temperature. The ethanol sensor measures levels of ethanol in parts per million (ppm). The pH sensor measures pH value and temperature. Lastly, the NIR sensor measures absorption values for wavelengths between 900 and 1700 nm. As an example, we present a subset of the VOC data in Table 1.

The data from the other sensors follows a similar format.

4. Results

After completion of storage tests and preprocessing of the data, a machine learning model can now be trained to predict the day of spoilage using features and labels. Regression methods, being a type of supervised learning, are particularly well-suited for this task. In a supervised learning method, the model learns the relationship between the input variables (features) and output variable (label) by fitting a mathematical function to the data. The coefficients of the function are chosen to make the most accurate predictions of the label [41].

4.1. Regression Model Concept

The concept for creating the regression model is illustrated in Figure 3.

Here, the preprocessed sensor data are divided into training data and test data. The training data are used as input for the regression method to create the regression model. The test data, in turn, serves as unknown and new data for the regression model. Using the test data, the model’s reliability on previously unknown data can be checked in an evaluation. Different metrics are calculated in the evaluation to determine the quality of the model. Based on these metrics, the regression model can be optimized by adjusting specific hyperparameters of the model. Once the regression model has been fully trained and optimized, it is ready for the application of prediction.

4.2. Regression Algorithms

Learning a regression model can be achieved through various algorithms. In this work, we compare and investigate the effectiveness of two specific methods, namely the Random Forest Regressor [42] and the XGBoost Regressor [43]. We evaluate their performance and suitability for the prediction of the day of spoilage. More complex machine learning methods, such as neural networks, were not explored because they require more time and computational power, and need significantly more labeled data for training. The labeling of data in the food industry is time-consuming and expensive. More practical and easier-to-understand methods, such as random forest and XGBoost, were preferred.

4.2.1. Random Forest Regressor

The Random Forest Regressor [42] is a particular implementation of the Random Forest algorithm that is used for regression tasks. An ensemble learning technique called Random Forest makes predictions by using multiple decision trees. Each decision tree in a Random Forest Regressor ensemble is trained using a random subset of the training data, and the ensemble’s predictions are averaged to produce the final prediction. This approach facilitates in lowering overfitting and enhancing the model’s general accuracy.

The algorithm is presented in pseudo code in Algorithm 1, and it requires several inputs including the data set, the number of trees in the ensemble (n_estimators), the minimum number of samples required to split an internal node (min_samples_split), and the maximum depth of the trees (max_depth). These parameters are important for controlling the complexity and performance of the model. The first step of the algorithm is to extract the features (X) and the labels (Y) from the data set. Then, it splits the data into training and test sets with a ratio of 75 to 25. Then, the algorithm initializes an empty list “forest” to store decision trees and iteratively samples the training data with replacement. For each iteration, it fits a decision tree to the sample using min_samples_split as the number of samples required to split an internal node and appends the tree to the list. By using random subsets of the data and random subsets of features at each split, a random forest is able to reduce overfitting and improve generalization performance. After all decision trees are fitted, it creates a new random forest model, fit it using the list of decision trees. Finally, it returns the fitted model “rfr” as the output.

Algorithm 1 Random Forest Regression.

1:: procedure RandomForest( $d a t a, n_e s t i m a t o r s, m i n_s a m p l e s_s p l i t, m a x_d e p t h$ )
2:: Extract features (X) and labels (Y) from the data
3:: $X_{t r a i n}, X_{t e s t}, Y_{t r a i n}, Y_{t e s t}$ ← split(data, ratio = 0.75)
4:: Initialize an empty list $f o r e s t$ to store decision trees
5:: for i in range $n_e s t i m a t o r s$ do
6:: Randomly sample $X_{t r a i n}^{'}$ , $Y_{t r a i n}^{'}$ from $X_{t r a i n}$ , $Y_{t r a i n}$ with replacement
7:: Fit a decision tree to $X_{t r a i n}^{'}$ and $Y_{t r a i n}^{'}$ with $m i n_s a m p l e s_s p l i t$ and $m a x_d e p t h$
8:: Add the fitted decision tree to $f o r e s t$
9:: end for
10:: Create a new random forest model $r f r$
11:: Fit $r f r$ using the decision trees in $f o r e s t$
12:: return the fitted model $r f r$
13:: end procedure

A RandomizedsearchCV with cv = 5 was conducted for each Random Forest Regressor to optimize the hyperparameters n_estimators, max_depths and min_samples_split. The use of RandomizedsearchCV allows for an efficient search of the hyperparameter space, as it randomly samples a set of potential hyperparameters to evaluate. However, it should be noted that the search is not exhaustive, meaning there could be other combinations of hyperparameters that would yield better results. The optimized hyperparameters for each Random Forest Regressor are shown in Table 2.

4.2.2. XGBoost Regressor

The XGBoost Regressor is a particular implementation of the XGBoost algorithm that is used for regression analysis. Regression is a supervised learning method that is used to predict a continuous outcome variable (also known as a response variable) based on one or more predictor variables. In the context of XGBoost, the regressor is trained using gradient boosting, which entails constructing a model by combining several weak learners (such as decision trees) and subsequently improving the model iteratively by modifying the weights of each learner based on the error in the preceding iteration. By using this approach, XGBoost is able to achieve highly accurate predictions on regression tasks. The XGBoost algorithm is outlined using pseudo code in Algorithm 2.

Algorithm 2 XGBoost Regressor.

1:: procedure XGBoost( $d a t a, n_e s t i m a t o r s, m a x_d e p t h, l e a r n i n g_r a t e, g a m m a$ )
2:: Extract features (X) and labels (Y) from the data
3:: $X_{t r a i n}, X_{t e s t}, Y_{t r a i n}, Y_{t e s t}$ ← split(data, ratio = 0.75)
4:: Initialize the XGBoost model with the given parameters:
5:: n_estimators, max_depth, learning_rate, gamma
6:: Create a new XGBoost model $x g b r$
7:: Fit $x g b r$ on $X_{t r a i n}, Y_{t r a i n}$
8:: return the fitted model $x g b r$
9:: end procedure

The algorithm requires as input the sensor data (features and label) and the hyperparameters: the number of trees in the model

(n_e s t i m a t o r s)

, the maximum depth of the trees

(m a x_d e p t h)

, the learning rate

(l e a r n i n g_r a t e)

and the regularization term

(g a m m a)

. For regulating the model’s complexity and effectiveness, these hyperparameters are crucial. The sensor data are divided into training and test sets with a ratio of 75 to 25. The algorithm then initializes the XGBoost model with the chosen hyperparameters and fits the model to the training set. Finally, the algorithm returns the fitted model “xgbr” as the output.

For optimizing the hyperparameters max_depth, learning_rate, n_estimators and gamma, a RandomizedsearchCV with cv = 5 was conducted for each XGBoost Regressor. The following optimized hyperparameters were obtained and are depicted in Table 3:

4.3. Evaluation

A Random Forest Regressor and an XGBoost Regressor were trained for each of the different sensors from Section 3.2. The models will be evaluated in the following, and their performance will be determined using various metrics. Listed below are the metrics that were utilized:

R-Squared ( $R^{2}$ )
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
Mean Absolute Error (MAE)
Symmetric Mean Absolute Percentage Error (SMAPE)

R^{2}

[44] is a statistical measure used to evaluate the goodness of fit of a regression. Its range of values goes from 0 to 1, where 1 represents a perfect model.

The Mean Squared Error (MSE) [45] is a metric commonly used in regression analysis. It measures the expected squared distance between the predicted values of the regression model and the true values. The formula for MSE is given by:

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(1)

where n is the number of observations,

y_{i}

is the true value of the i-th observation, and

{\hat{y}}_{i}

is the predicted value of the i-th observation. Although there is no absolute value to imply whether a model is good, MSE can be used to compare models to one another, as well as other metrics. The other metrics also hold true in this regard.

Root Mean Squared Error (RMSE) [46] is calculated by taking the square root of the mean squared errors. RMSE is a popular evaluation metric in regression analysis and machine learning. It is similar to MSE but easier to interpret since the RMSE value has the same scale as the predicted values. The formula for RMSE is given by:

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(2)

The Mean Absolute Error (MAE) [46] measures the average magnitude of the errors in a prediction, indicating how far off the predicted values are from the true values. It does not indicate the direction of the deviation. The formula for calculating MAE is given by:

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(3)

Here, n is the number of data points,

y_{i}

is the true value for the i-th data point, and

{\hat{y}}_{i}

is the predicted value for the i-th data point. By taking the absolute value of the difference between the true and predicted values and averaging over all data points, MAE provides a measure of the average magnitude of the errors in the predictions.

The Symmetric Mean Absolute Percentage Error (SMAPE) [47] is a measure of accuracy based on percentage errors, which can be calculated using the following equation:

SMAPE = \frac{1}{n} \sum_{i = 1}^{n} \frac{2 | y_{i} - {\hat{y}}_{i} |}{| y_{i} | + | {\hat{y}}_{i} |} \times 100 %

(4)

SMAPE is very similar to the Mean Absolute Percentage Error (MAPE) [48], but SMAPE is preferred in situations where estimates close to zero are important, as MAPE does not provide any information on such estimates. However, one disadvantage of SMAPE is its interpretability. While MAPE has a range of values from 0 to 100%, SMAPE has a range of values from 0 to 200%, with 0% being the best value.

A basic linear regression model [49] serves as a benchmark for better assessing the performance of the Random Forest Regressor and XGBoost Regressor. The results of this model are shown in the following Table 4. This table contains valuable metrics such as

R^{2}

, MSE, RMSE, MAE, and SMAPE, which reflect the performance of the regressor for each sensor. With these results, we can assess the strength of the relationship between the sensors and the output, as well as the accuracy of the model.

The CO₂ sensor has the highest

R^{2}

value of 0.60, indicating that it explains about 60% of the variability in the data. The NIR sensor also has a high

R^{2}

value of 0.86, indicating a strong linear relationship between the sensor and the output. On the other hand, the pH sensor has the lowest

R^{2}

value of 0.26, indicating a weaker linear relationship. The MSE values range from 5.50 for the CO₂ sensor to 11.02 for the pH sensor, with the NIR sensor having the lowest value of 1.89. Similarly, the RMSE values range from 2.34 for the CO₂ sensor to 3.32 for the pH sensor, with the NIR sensor having the lowest value of 1.37. The MAE values range from 1.70 for the CO₂ sensor to 2.82 for the pH sensor, with the NIR sensor having the lowest value of 1.04. Finally, the SMAPE values range from 72.32% for the CO₂ sensor to 121.95% for the pH sensor, with the NIR sensor having the lowest value of 63.27%. In summary, the results of the benchmark linear regression model show that the NIR sensor has the best performance in terms of

R^{2}

, MSE, RMSE, MAE, and SMAPE among the sensors used, while the pH sensor has the worst performance.

The results of the evaluation of the Random Forest Regressor are presented in Table 5.

The VOC sensor has the highest

R^{2}

value of 0.99, indicating that it explains almost 99% of the variability in the data. The CO₂ sensor also has a high

R^{2}

value of 0.98, indicating a strong relationship between the sensor and the output. On the other hand, the Ethanol sensor has the lowest

R^{2}

value of 0.56, indicating a weaker relationship. The MSE values range from 0.14 for the VOC sensor to 5.52 for the Ethanol sensor, with the CO₂ sensor having the second lowest value of 0.34. Similarly, the RMSE values range from 0.37 for the VOC sensor to 2.35 for the Ethanol sensor, with the CO₂ sensor having the second lowest value of 0.58. The MAE values range from 0.15 for the VOC sensor to 1.78 for the Ethanol sensor, with the CO₂ sensor having the second lowest value of 0.23. Finally, the SMAPE values range from 12.52% for the VOC sensor to 93.38% for the Ethanol sensor, with the CO₂ sensor having the second lowest value of 15.76%. In summary, The results of the Random Forest Regressor model show that the VOC sensor has the best performance in terms of

R^{2}

, MSE, RMSE, MAE, and SMAPE among the sensors used, while the Ethanol sensor has the worst performance.

The results of the XGBoost Regressor evaluation are shown in Table 6.

The VOC sensor has the highest

R^{2}

value of 0.99, indicating that it explains almost 99% of the variability in the data. The CO₂ sensor also has a high

R^{2}

value of 0.97, indicating a strong relationship between the sensor and the output. On the other hand, the Ethanol sensor has the lowest

R^{2}

value of 0.56, indicating a weaker relationship. The MSE values range from 0.14 for the VOC sensor to 5.51 for the Ethanol sensor, with the CO₂ sensor having the second lowest value of 0.42. Similarly, the RMSE values range from 0.38 for the VOC sensor to 2.35 for the Ethanol sensor, with the CO₂ sensor having the second lowest value of 0.65. The MAE values range from 0.18 for the VOC sensor to 1.72 for the Ethanol sensor, with the CO₂ sensor having the second lowest value of 0.30. Finally, the SMAPE values range from 24.73% for the VOC sensor to 86.91% for the Ethanol sensor, with the CO₂ sensor having the second lowest value of 30.99%. In summary, The results of the XGBoost Regressor model show that the VOC sensor has the best performance in terms of

R^{2}

, MSE, RMSE, and SMAPE among the sensors used. On the other hand, the Ethanol sensor has the worst performance in

R^{2}

and SMAPE, while the NIR sensor has the worst performance in terms of MAE.

The NIR sensor has the best performance among the sensors used in the benchmark linear regression model, while the VOC sensor has the best performance in the Random Forest Regressor model and XGBoost Regressor model. The Ethanol sensor has the worst performance in all three models. Overall, the VOC sensor has the highest

R^{2}

value and the best performance in terms of MSE, RMSE, MAE, and SMAPE among the sensors used in all three models.

5. Discussion

A fresh pizza was used to investigate whether low-cost sensors and machine learning methods are suitable for providing better information about the spoilage date. For this purpose, the sensors CO₂, VOC, ethanol, pH and NIR were used in a defined measurement setup for storage tests. The data of the sensors served as an input for different regression models (linear regression, random forest regression and XGBoost regression). The evaluation revealed that it is possible to make accurate predictions about the fresh pizza’s spoilage date based on the low cost sensor data and a regression model. In particular, the sensors for VOC, CO₂ and NIR proved to be particularly insightful. In combination with a random forest regressor or XGBoost regressor, they showed good results. Traditionally, determining the minimum shelf life date in the food industry involves empirical values and sample storage tests with microbiological and sensory analyses, with each company having its own approach. However, the integration of low-cost sensors and regression models represents a significant step towards a more sustainable and efficient food industry. This approach not only improves accuracy and cost-effectiveness but also supports the preservation of resources and the promotion of sustainable food consumption.

5.1. Findings

Regarding the question: What are examples of the use of machine learning in the food industry regarding spoilage detection and what are their limitations? Machine learning is being utilized in the food industry to improve spoilage detection and prevention in ready-to-eat pizza and other food products. Studies have analyzed the impact of packaging materials, modified atmospheres, and storage conditions on the sensory, chemical, and microbiological properties of pizza and its components like mozzarella cheese and vegetables. However, laboratory tests for these studies can be costly and time-consuming. Machine learning techniques such as neural networks, support vector machines, k-nearest neighbor algorithm, and image processing are being applied to address these limitations. These methods are used to classify food based on color and texture features, detect defects or stains, and determine the maturity and quality of fruits and vegetables. However, there are some challenges with these methods such as the cost of technology, the requirement for large data sets to train the models, and lower accuracy compared to traditional methods in some cases. Further research is needed to evaluate the efficacy of these methods in real-world scenarios.

Regarding the question: What types of sensors provide the most relevant and accurate data for determining the shelf life of fresh pizza? The evaluation conducted on the data of the five sensors (CO₂, VOC, ethanol, pH, and NIR) indicate that VOC is the most relevant and accurate for determining the shelf life of fresh pizza. VOC performed best in two of the three regression models tested (Random Forest and XGBoost), and the CO₂ sensor data was a close runner-up with similar results. NIR performed well in the linear regression model, but showed slight inferiority in the other two models. On the other hand, the data from ethanol and pH sensors had the poorest results and are considered unsuitable for predicting spoilage date. Hence, it is advised to use VOC, CO₂, and NIR sensors instead for collecting useful data for determining the shelf life of fresh pizza.

Regarding the question: What machine learning algorithm can be effectively utilized to predict the shelf life of fresh pizza using sensor data? The evaluation of linear regression, random forest regression, and XGBoost regression models found that regression models are suitable for predicting spoilage and shelf life of fresh pizza. The best result was achieved with the Random Forest Regression using VOC sensor data, with an

R^{2}

value of 0.99 and a SMAPE value of

12.52 %

. The second best was with the Random Forest Regression and CO₂ sensor data, with an

R^{2}

value of 0.98 and a SMAPE value of

15.76 %

. The third best was with the XGBoost Regression using VOC sensor data, with an

R^{2}

value of 0.99 and a SMAPE value of

24.73 %

. The Random Forest Regressor performed slightly better than the XGBoost Regressor. But both are capable of accurately predicting the remaining shelf life of fresh pizza. This is evident from their superior performance compared to the benchmark linear regression, which showed significantly worse results across all sensor data.

5.2. Strengths and Weaknesses

The use of low-cost sensors, such as VOC or CO₂, together with a regression model, such as Random Forest Regressor or XGBoost Regressor, provides a highly accurate and cost-effective way to determine the freshness of pizza. This allows for more precise predictions of the day of spoilage and the best-before date, ultimately reducing food waste and promoting sustainable food consumption. Food waste reduction is becoming increasingly important due to growing concerns over sustainability and the depletion of resources. By applying our approach, we can save valuable resources and have a positive impact on the environment.

Our approach to determine the freshness of pizza using low-cost sensors and regression models is limited in some aspects that need to be considered. Firstly, the training data used to develop the model were collected under controlled conditions, but real-world conditions such as temperature and humidity fluctuations can significantly impact the freshness of the pizza. Predictions based on the model are only accurate if the food is stored under similar conditions as the training data. This limitation restricts the general applicability of the model, particularly when applied to other stages of the food’s life cycle such as transportation or consumption by end-users. It would be crucial to collect additional information during these stages and add it to the model as extra features in order to address this issue. Additionally, fresh pizza is a complex food item made up of numerous ingredients, including various toppings, which may affect the accuracy of the model performance. To determine the approach’s robustness and confirm its validity, it is critical to evaluate it for other variants of pizza. We can increase the accuracy and effectiveness of the approach for determining the freshness of pizza by taking into account these limitations and making the necessary improvements.

5.3. Outlook

To achieve practical applicability of the integration of sensor data from VOC, CO₂, and NIR, further research is required to address the limitations of the current proof-of-concept results. In order to improve data quality and enhance the accuracy of the prediction model, a comprehensive data collection and analysis approach should be adopted. This would involve expanding the model and data collection to encompass all phases of the pizza life cycle, including production, distribution, sales, and consumption. Additionally, life cycle phase should be integrated as an additional feature in the model to capture the variability in the gas sensor and NIR signals across the different phases. Further investigation on the predictive ability of the model on pizzas with various toppings would also enable the robustness and informative value of the sensor and NIR data to be evaluated. These efforts will facilitate the practical implementation of the integrated sensor data approach in everyday settings.

Author Contributions

Conceptualization, P.W.; methodology, P.W.; software, P.W.; validation, P.W., D.P. and M.N.; formal analysis, P.W.; investigation, D.P. and M.N.; resources, P.W., D.P. and M.N.; data curation, P.W., D.P. and M.N.; writing—original draft preparation, P.W., D.P., M.N. and S.W.; writing—review and editing, P.W. and D.P.; visualization, P.W. and D.P.; supervision, H.D., H.-J.D. and V.L.; project administration, H.-J.D., V.L. and H.D.; funding acquisition, H.-J.D., V.L. and H.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by German Federal Ministry of Education and Research (BMBF) under grant number 13FH3I03IA.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

United Nations. Transforming Our World: The 2030 Agenda for Sustainable Development; United Nations: New York, NY, USA, 2015. [Google Scholar]
Hafner, G.; Barabosz, J.; Schneider, F.; Lebersorger, S.; Scherhaufer, S.; Schuller, H.; Leverenz, D.; Kranert, M. Ermittlung der Weggeworfenen Lebensmittelmengen und Vorschläge zur Verminderung der Wegwerfrate bei Lebensmitteln in Deutschland. 2012. Available online: https://www.bmel.de/SharedDocs/Downloads/DE/_Ernaehrung/Lebensmittelverschwendung/Studie_Lebensmittelabfaelle_Langfassung.pdf?__blob=publicationFile&v=3 (accessed on 22 February 2023).
Gustavsson, J.; Cederberg, C.; Sonesson, U. Global Food Losses and Food Waste: Extent, Causes and Prevention. In Proceedings of the Study Conducted for the International Congress Save Food, at Interpack 2011, Düsseldorf, Germany, 16–17 May 2011; FAO: Rome, Italy, 2011. [Google Scholar]
Schmidt, T.G.; Schneider, F.; Leverenz, D.; Hafner, G. Food Waste in Germany-Baseline 2015; Thünen-Report; Johann-Heinrich-von-Thünen-Institut: Braunschweig, Germany, 2019; Volume 71. [Google Scholar] [CrossRef]
European Union. Regulation (EU) No 1169/2011 of the European Parliament and of the Council of 25 October 2011 on the Provision of Food Information to Consumers, Amending Regulations (EC) No 1924/2006 and(EC) No 1925/2006 of the European Parliament and of the Council, and Repealing Commission Directive 87/250/EEC, Council Directive 90/496/EEC, Commission Directive 1999/10/EC, Directive 2000/13/EC of the European Parliament and of the Council, Commission Directives 2002/67/EC and 2008/5/EC and Commission Regulation (EC) No 608/2004. 2018. Available online: https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=CELEX:02011R1169-20180101 (accessed on 22 February 2023).
Beretta, C.; Kremer-Hartmann, K.; Spielmann-Prada, G.; Züst, M.; Gantenbein-Demarchi, C.; Müller, C. Leitfaden zur Reduktion von Lebensmittelverlusten bei der Abgabe von Lebensmitteln: Rechtliche Aspekte und Lebensmittelsicherheit (Grundlagenbericht); Technical Report; ZHAW Zürcher Hochschule für Angewandte Wissenschaften: Wädenswil, Switzerland, 2021. [Google Scholar] [CrossRef]
Matissek, R. Sichere Lebensmittel–Mittel zum guten Leben. In Lebensmittelsicherheit: Kontaminanten–Rückstände–Biotoxine; Springer: Berlin/Heidelberg, Germany, 2020; pp. 37–55. [Google Scholar]
European Union. EU-Strategie für ein Nachhaltiges Lebensmittelsystem. 2021. Available online: https://www.europarl.europa.eu/news/de/headlines/society/20200519STO79425/eu-strategie-fur-ein-nachhaltiges-lebensmittelsystem (accessed on 22 February 2023).
Singh, P.; Goyal, G. Modified atmosphere packaging and storage on sensory characteristics of ready-to-bake pizza. Nutr. Food Sci. 2010, 40, 299–304. [Google Scholar] [CrossRef] [Green Version]
Singh, P.; Wani, A.; Goyal, G. Quality of Chilled Ready-to-Bake Pizza Stored in Air and under Modified Atmospheres: Microbiological and Sensory Attributes. Food Sci. Biotechnol. 2011, 20, 1–6. [Google Scholar] [CrossRef]
Singh, P.; Wani, A.; Goyal, G. Shelf-Life Extension of Fresh Ready-to-Bake Pizza by the Application of Modified Atmosphere Packaging. Food Bioprocess Technol. 2012, 5, 1028–1037. [Google Scholar] [CrossRef]
Fasano, L.; Gallo, C. Pizza-basis with tomato packaged with modified atmosphere: Environmental monitoring and shelf-life studies. Ind. Aliment. 2001, 40, 1039–1044. [Google Scholar]
Alves, R.M.V.; De Luca Sarantopoulos, C.I.G.; Van Dender, A.G.F.; De Assis Fonseca Faria, J. Stability of Sliced Mozzarella Cheese in Modified-Atmosphere Packaging. J. Food Prot. 1996, 59, 838–844. [Google Scholar] [CrossRef] [PubMed]
Alam, T.; Goyal, G.K. Effect of MAP on microbiological quality of Mozzarella cheese stored in different packages at 7 ± 1 °C. J. Food Sci. Technol. 2011, 48, 120–123. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lucera, A.; Costa, C.; Mastromatteo, M.; Conte, A.; Del Nobile, M.A. Inuence of different packaging systems on fresh-cut zucchini (Cucurbita pepo). Innov. Food Sci. Emerg. Technol. 2010, 11, 361–368. [Google Scholar] [CrossRef]
Manolopoulou, H.; Xanthopoulos, G.; Douros, N.; Lambrinos, G. Modified atmosphere packaging storage of green bell peppers: Quality criteria. Biosyst. Eng. 2010, 106, 535–543. [Google Scholar] [CrossRef]
Oliveira, F.; Sousa-Gallagher, M.; Mahajan, P.; Teixeira, J.A. Development of shelf-life kinetic model for modified atmosphere packaging of fresh sliced mushrooms. J. Food Eng. 2012, 111, 466–473. [Google Scholar] [CrossRef] [Green Version]
Ireri, D.; Belal, E.; Okinda, C.; Makange, N.; Ji, C. A computer vision system for defect discrimination and grading in tomatoes using machine learning and image processing. Artif. Intell. Agric. 2019, 2, 28–37. [Google Scholar] [CrossRef]
Kanade, A.; Shaligram, A. Prepackaging Sorting of Guava Fruits using Machine Vision based Fruit Sorter System based on K-Nearest Neighbor Algorithm. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2018, 3, 1972–1977. [Google Scholar]
Liang, X.; Jia, X.; Huang, W.; He, X.; Li, L.; Fan, S.; Li, J.; Zhao, C.; Zhang, C. Real-Time Grading of Defect Apples Using Semantic Segmentation Combination with a Pruned YOLO V4 Network. Foods 2022, 11, 3150. [Google Scholar] [CrossRef]
Basak, J.K.; Madhavi, B.G.K.; Paudel, B.; Kim, N.E.; Kim, H.T. Prediction of Total Soluble Solids and pH of Strawberry Fruits Using RGB, HSV and HSL Colour Spaces and Machine Learning Models. Foods 2022, 11, 2086. [Google Scholar] [CrossRef]
Kumar, R.A.; Rajpurohit, V.S.; Bidari, K.Y. Multi Class Grading and Quality Assessment of Pomegranate Fruits Based on Physical and Visual Parameters. Int. J. Fruit Sci. 2019, 19, 372–396. [Google Scholar] [CrossRef]
Ropelewska, E.; Sabanci, K.; Aslan, M.F. The Changes in Bell Pepper Flesh as a Result of Lacto-Fermentation Evaluated Using Image Features and Machine Learning. Foods 2022, 11, 2956. [Google Scholar] [CrossRef] [PubMed]
Basile, T.; Marsico, A.D.; Perniola, R. Use of Artificial Neural Networks and NIR Spectroscopy for Non-Destructive Grape Texture Prediction. Foods 2022, 11, 281. [Google Scholar] [CrossRef]
Xiong, Y.; Li, Y.; Wang, C.; Shi, H.; Wang, S.; Yong, C.; Gong, Y.; Zhang, W.; Zou, X. Non-Destructive Detection of Chicken Freshness Based on Electronic Nose Technology and Transfer Learning. Agriculture 2023, 13, 496. [Google Scholar] [CrossRef]
Kim, T.H.; Kim, J.H.; Kim, J.Y.; Oh, S.E. Egg Freshness Prediction Model Using Real-Time Cold Chain Storage Condition Based on Transfer Learning. Foods 2022, 11, 3082. [Google Scholar] [CrossRef]
Darwish, A.; Ricci, M.; Zidane, F.; Vasquez, J.A.T.; Casu, M.R.; Lanteri, J.; Migliaccio, C.; Vipiana, F. Physical Contamination Detection in Food Industry Using Microwave and Machine Learning. Electronics 2022, 11, 3115. [Google Scholar] [CrossRef]
Fengou, L.C.; Liu, Y.; Roumani, D.; Tsakanikas, P.; Nychas, G.J.E. Spectroscopic Data for the Rapid Assessment of Microbiological Quality of Chicken Burgers. Foods 2022, 11, 2386. [Google Scholar] [CrossRef]
Cheng, J.H.; Sun, D.W. Partial Least Squares Regression (PLSR) Applied to NIR and HSI Spectral Data Modeling to Predict Chemical Properties of Fish Muscle. Food Eng. Rev. 2017, 9, 36–49. [Google Scholar] [CrossRef]
Faqeerzada, M.A.; Lohumi, S.; Kim, G.; Joshi, R.; Lee, H.; Kim, M.S.; Cho, B.K. Hyperspectral Shortwave Infrared Image Analysis for Detection of Adulterants in Almond Powder with One-Class Classification Method. Sensors 2020, 20, 5855. [Google Scholar] [CrossRef] [PubMed]
Kang, Z.; Zhao, Y.; Chen, L.; Guo, Y.; Mu, Q.; Wang, S. Advances in Machine Learning and Hyperspectral Imaging in the Food Supply Chain. Food Eng. Rev. 2022, 14, 596–616. [Google Scholar] [CrossRef]
Özdoğan, G.; Lin, X.; Sun, D.W. Rapid and noninvasive sensory analyses of food products by hyperspectral imaging: Recent application developments. Trends Food Sci. Technol. 2021, 111, 151–165. [Google Scholar] [CrossRef]
Franz Binder GmbH & Co. Elektrische Bauelemente KG. Model MKF 115. Available online: https://www.binder-world.com/int-en/product/mkf-115 (accessed on 10 March 2023).
Sensirion. SCD30. Available online: https://sensirion.com/products/catalog/SCD30/ (accessed on 22 February 2023).
Sensirion. SEK-SVM40. Available online: https://sensirion.com/products/catalog/SEK-SVM40/ (accessed on 22 February 2023).
Vernier Software & Technology. Go Direct® Ethanol Vapor. Available online: https://www.vernier.com/product/go-direct-ethanol-vapor/ (accessed on 22 February 2023).
Xylem Inc. Multi-Parameter Portable Meter MultiLine® Multi 3620 IDS. Available online: https://www.xylemanalytics.com/en/general-product/id-431/multi-parameter-portable-meter-multiline%C2%AE-multi-3620-ids (accessed on 22 February 2023).
Xylem Inc. IDS pH Penetration Measurement with SenTix® Sp-T 900. Available online: https://www.xylemanalytics.com/en/general-product/id-68/wtw---ids-ph-penetration-measurement-with-sentix%C2%AE-sp-t-900 (accessed on 22 February 2023).
Tellspec. Available online: https://tellspec.com/ (accessed on 22 February 2023).
InnoSpectra Corporation. Standard Wavelength NIR Spectrometer. Available online: https://www.inno-spectra.com/en/product (accessed on 22 February 2023).
Murphy, K.P. Machine Learning: A Probabilistic Perspective; The MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef] [Green Version]
Colin Cameron, A.; Windmeijer, F.A. An R-squared measure of goodness of fit for some common nonlinear regression models. J. Econom. 1997, 77, 329–342. [Google Scholar] [CrossRef]
Wallach, D.; Goffinet, B. Mean squared error of prediction as a criterion for evaluating and comparing system models. Ecol. Model. 1989, 44, 299–306. [Google Scholar] [CrossRef]
Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE)?—Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef] [Green Version]
Kreinovich, V.; Nguyen, H.T.; Ouncharoen, R. How to Estimate Forecasting Quality: A System-Motivated Derivation of Symmetric Mean Absolute Percentage Error (SMAPE) and Other Similar Characteristics; Technical Report UTEP-CS-14-53; The University of Texas at El Paso: El Paso, TX, USA, 2014. [Google Scholar]
de Myttenaere, A.; Golden, B.; Le Grand, B.; Rossi, F. Mean Absolute Percentage Error for regression models. Neurocomputing 2016, 192, 38–48. [Google Scholar] [CrossRef] [Green Version]
Groß, J. Linear Regression; Springer: Berlin/Heidelberg, Germany, 2003; Volume 175. [Google Scholar]

Figure 1. General Concept.

Figure 2. Measurement Setup.

Figure 3. Regression Model Concept.

Table 1. Example Subset VOC data.

Humidity	Temperature	VOC	Label
75.33	6.77	26081	7
75.30	6.77	26088	7
75.33	6.79	26091	7
75.32	6.77	26089	7
75.32	6.78	26086	7

Table 2. Best Model Parameters Random Forest Regressor.

Sensors	n_Estimators	Min_Samples_Split	Max_Depth
${CO}_{2}$	20	2	None
VOC	10	10	None
Ethanol	50	2	5
pH	500	20	10
NIR	100	2	10

Table 3. Best Model Parameters XGBoost Regressor.

Sensors	n_Estimators	Max_Depth	Learning_Rate	Gamma
${CO}_{2}$	20	10	1	10
VOC	20	10	0.5	0
Ethanol	1000	5	0.1	10
pH	50	5	0.1	10
NIR	100	5	0.1	0

Table 4. Performance Metrics Benchmark Linear Regression.

Sensors	$R^{2}$	$MSE$	$RMSE$	$MAE$	$SMAPE$
${CO}_{2}$	0.60	5.50	2.34	1.70	72.32%
VOC	0.43	7.15	2.67	2.25	98.53%
Ethanol	0.40	7.49	2.74	2.26	114.95%
pH	0.26	11.02	3.32	2.82	121.95%
NIR	0.86	1.89	1.37	1.04	63.27%

Table 5. Performance Metrics Random Forest Regressor.

Sensors	$R^{2}$	$MSE$	$RMSE$	$MAE$	$SMAPE$
${CO}_{2}$	0.98	0.34	0.58	0.23	15.76%
VOC	0.99	0.14	0.37	0.15	12.52%
Ethanol	0.56	5.52	2.35	1.78	93.38%
pH	0.76	3.65	1.91	1.18	59.92%
NIR	0.97	0.43	0.66	0.32	30.80%

Table 6. Performance Metrics XGBoost Regressor.

Sensors	$R^{2}$	$MSE$	$RMSE$	$MAE$	$SMAPE$
${CO}_{2}$	0.97	0.42	0.65	0.30	30.99%
VOC	0.99	0.14	0.38	0.18	24.73%
Ethanol	0.56	5.51	2.35	1.72	86.91%
pH	0.76	3.56	1.89	1.22	64.30%
NIR	0.97	0.48	0.69	0.40	35.13%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wunderlich, P.; Pauli, D.; Neumaier, M.; Wisser, S.; Danneel, H.-J.; Lohweg, V.; Dörksen, H. Enhancing Shelf Life Prediction of Fresh Pizza with Regression Models and Low Cost Sensors. Foods 2023, 12, 1347. https://doi.org/10.3390/foods12061347

AMA Style

Wunderlich P, Pauli D, Neumaier M, Wisser S, Danneel H-J, Lohweg V, Dörksen H. Enhancing Shelf Life Prediction of Fresh Pizza with Regression Models and Low Cost Sensors. Foods. 2023; 12(6):1347. https://doi.org/10.3390/foods12061347

Chicago/Turabian Style

Wunderlich, Paul, Daniel Pauli, Michael Neumaier, Stephanie Wisser, Hans-Jürgen Danneel, Volker Lohweg, and Helene Dörksen. 2023. "Enhancing Shelf Life Prediction of Fresh Pizza with Regression Models and Low Cost Sensors" Foods 12, no. 6: 1347. https://doi.org/10.3390/foods12061347

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Shelf Life Prediction of Fresh Pizza with Regression Models and Low Cost Sensors

Abstract

1. Introduction

2. Relevant Work