A Pork Price Prediction Model Based on a Combined Sparrow Search Algorithm and Classification and Regression Trees Model

Qin, Jing; Yang, Degang; Zhang, Wenlong

doi:10.3390/app132312697

Open AccessArticle

A Pork Price Prediction Model Based on a Combined Sparrow Search Algorithm and Classification and Regression Trees Model

by

Jing Qin

¹,

Degang Yang

^1,2,*

and

Wenlong Zhang

¹

Institute of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China

²

Chongqing Engineering Research Center of Educational Big Data Intelligent Perception and Application, Chongqing 401331, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(23), 12697; https://doi.org/10.3390/app132312697

Submission received: 9 October 2023 / Revised: 16 November 2023 / Accepted: 21 November 2023 / Published: 27 November 2023

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

The frequent fluctuation of pork prices has seriously affected the sustainable development of the pork industry. The accurate prediction of pork prices can not only help pork practitioners make scientific decisions but also help them to avoid market risks, which is the only way to promote the healthy development of the pork industry. Therefore, to improve the prediction accuracy of pork prices, this paper first combines the Sparrow Search Algorithm (SSA) and traditional machine learning model, Classification and Regression Trees (CART), to establish an SSA-CART optimization model for predicting pork prices. Secondly, based on the Sichuan pork price data during the 12th Five-Year Plan period, the linear correlation between piglet, corn, fattening pig feed, and pork price was measured using the Pearson correlation coefficient. Thirdly, the MAE fitness value was calculated by combining the validation set and training set, and the hyperparameter “MinLeafSize” was optimized via the SSA. Finally, a comparative analysis of the prediction performance of the White Shark Optimizer (WSO)-CART model, CART model, and Simulated Annealing (SA)-CART model demonstrated that the SSA-CART model has the best prediction of pork price (compared with a single decision tree, R₂ increased by 9.236%), which is conducive to providing support for pork price prediction. The accurate prediction of pork prices with an optimized machine learning model is of great practical significance for stabilizing pig production, ensuring the sustainable growth of farmers’ income, and promoting sound economic development.

Keywords:

decision tree regression; combination model; optimization algorithm; SSA; pork price prediction

1. Introduction

Pork is one of the most important foods in China and plays a pivotal role in meeting the growing needs of the people for a better life. At the same time, it also has an important impact on the healthy and sustainable development of the economy and society. Of these impacts, price is the core factor. On the one hand, pork is an important part of residents’ daily consumption. Changes in pork prices will significantly affect the consumer price index (CPI) in China [1,2]. More seriously, the rise in pork prices may lead to an overall and sustained increase in prices, which in turn will cause inflation [3]. On the other hand, fluctuations in pork prices can directly affect agricultural markets, water markets, grain markets, oil markets, and so on [4,5,6,7,8]. It can be seen that the construction of an efficient and accurate pork price forecasting model is not only conducive to pig enterprises and farmers making more sensible decisions about pork prices, but is also conducive to providing recommendations for relevant departments to formulate macroeconomic policies. At the same time, it can help optimize the pork market environment [9].

In recent years, a number of intelligent optimization algorithms have been put forth; researchers have combined various intelligent optimization algorithms for image processing, data analyses, fault detection, feature selection, support vector machines, wireless sensors, neural networks, and other technological fields [10,11,12,13]. Intelligent optimization algorithms have an excellent ability to solve path-planning problems and have been widely used in machine learning [14,15,16,17]. Common intelligent optimization techniques include the Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization, Whale Shark Optimization, and so forth [18,19,20,21,22,23]. These algorithms have become powerful tools to search the optimal parameters of different models, because they have different search strategies and the ability to adjust parameters.

In previous research, efforts have been made to predict pork prices by integrating optimization algorithms with artificial intelligence techniques. Ref. [24] proposed a chaotic neural network model based on the Genetic Algorithm (GA)-CNN; it optimized the connection weights between the nodes in the input layer, hidden layer, and output layer. According to the experimental findings, by using the GA-CNN model to predict 1603 daily sample figures, the relative error between the predicted data and real data is less than 0.5%, proving that the GA-CNN has a very excellent performance [25,26,27]. Therefore, in order to further enhance the prediction accuracy of pork prices, our study adopts the CART decision tree technique and proposes a pig price prediction model based on parameter optimization using the Sparrow Search Algorithm (SSA). The proposed model addresses the impact of uncertainty in the parameter settings on the prediction accuracy and accelerates model convergence. By incorporating an optimized iterative combination and hybrid algorithm into the CART model, this experiment proves that the proposed method can obtain the optimal MinLeafSize without empirical guidance, which reduces the defects of human experience in selecting the MinLeafSize of the decision tree, and allows the optimized CART model to achieve good predictive performance.

The main contributions of this paper are as follows:

(1): This paper first uses the latest optimization algorithm, SSA (Sparrow Search Algorithm), to optimize the MinLeafSize of the decision tree, reduce the overfitting risk of the model, overcome the limitations of single model prediction, and explore a new path to solve the problem of the easy overfitting of the decision tree.
(2): In this paper, the correlation coefficients between various variables are measured using the Pearson correlation coefficient and visually expressed via thermal maps.
(3): Different from the traditional method of optimizing hyperparameters on the training set, this study divided the dataset into three categories and updated the search strategy by calculating the MAE values of the validation set and the training set.
(4): We compared the performance of multiple combination models by combining different optimization algorithms and decision trees, and found an optimal combination model for pork price prediction.
(5): Compared with the traditional Classification and Regression Trees (CART) model, SA-CART model, and WSO-CART model, the proposed model has a higher prediction accuracy. Like the SSA-CART combination model, it is more suitable for the current pork price dataset. This algorithm not only improved the accuracy of the pork price forecasting model, but also provided a new method for pork price forecasting. In general, when compared with the existing algorithms, this model has significantly improved the search accuracy, convergence speed, stability, and avoidance of local optimal values of pork price prediction.

The main process is depicted in Figure 1. The initial step of the data preprocessing includes the data preparation, data standardization, and data splitting. Subsequently, the preprocessed data are fed into four different models for training. Finally, the predicted results will be evaluated using a variety of metrics.

The rest of this paper is organized as follows. Section 2 introduces an overview of related work. Section 3 introduces the mathematical theory and flow chart of the SSA-CART model. Section 4 describes the dataset in detail. Section 5 describes the specifics of the experiment and the comparative experiment. Section 6 summarizes the main research content of this paper, and discusses the future recommendations.

2. Related Work

According to the different research data, the forecasting methods for pork prices can be categorized into single factor methods and multi-element methods. In addition, based on the different model structures, these methods can be divided into a single model and a combined model. Moreover, a number of scholars in related fields have adopted a combination of machine learning models and deep learning algorithms to address the issue of pork price prediction.

2.1. The Multiple-Factor Method

The prediction of pork prices is faced with significant challenges, and so far, there have been many researchers who have combined and used different machine learning models and deep learning algorithms to tackle this problem. Liu et al., 2019, proposed a similar subsequence search with support vector regression to solve the problem of pseudo-cycles caused by cycle length variation, which was tested using 18 samples of weekly data, and compared the results of the support vector regression (SVR) model, Wavelet-SVR model, and BPNN model; the results of these experiments demonstrated that the similar subsequence search and support vector regression models were successful in resolving the pseudo periodic issue, obtaining greater precision, and validating the effectiveness of their model, but the similar subsequence search and support vector regression model were too smooth and could not fit the fluctuation of real price series well [28,29,30,31,32,33]. To reduce the feature dimensionality in pork prices, researchers [34] utilized the topic modeling technique (LDA) to select the top 20 features from an initial set of 862 features using Pearson correlation. Then, they employed LSTM to construct a pork price prediction model, and endeavored to refine performance of the prediction model through feature selection. Despite this, there is still a considerable gap between the expected and actual values.

2.2. The Combining Model

Attempting to find the best way to address the factors of cyclical price fluctuations, seasonal variations, and irregular fluctuations in pork prices, a hybrid model based on a combination of loess-based Seasonal Trend Decomposition (STL), SVR, and Autoregressive Moving Average (Alma) was proposed [35], which captures its trend over time, and consequently predicts the price of hogs in the next breeding cycle. Comparing STL-SVR-Alma with Bayesian Linear Regression, Lasso regression, and Random Forest, their experiments demonstrated that the model was a promising alternative to pork price prediction; due to the extended time series, the proposed hybrid model outperforms the selected competitors, but this study period pork price farming cycle is as long as 4~5 months, which is not conducive to the short-term control of pork price [36]. Research personnel [37] applied the GA-SVR modeling algorithm in pork price prediction to deal with the abnormal fluctuation in pork prices, which is characterized by nonlinearity, uncertainty, phasing, and spatial and temporal characteristics. The Grey Correlation Degree (GCD) is utilized to assess the correlation of each element, the similar elements are aggregated, the GA-SVR model is used to predict the aggregated elements, and then the prediction outcomes of each element are added up to obtain the ultimate outcome. This model incorporates multiple modules to enhance prediction performance and durability compared to a single SVR model. However, the structure is complicated and sluggish, which is not favorable for realistic implementations [38,39].

In the previous studies, researchers have employed various feature extraction techniques to analyze the factors that have a more significant impact on pork prices. In addition, researchers have achieved a higher degree of the model fit by extending the time series and manually adjusting the model parameters. Table 1 provides the different researchers who used different models to predict pork prices and the conclusions of these studies. Nevertheless, these methods have such problems as variable uncertainty, strong subjectivity, and poor model interpretation [40,41]. In addition, other scholars used hybrid models such as combination, integration, and optimization to improve model accuracy, and these methods performed well in the training process; however, these methods have the phenomenon of slow convergence speed and overfitting during the test process, which will lead to large errors in practical application [42].

Based on the above literature review, it can be noticed that how to forecast pork prices efficiently has been discussed in the literature from different perspectives. Most of the work in this field utilized LSTM and neural networks to help pork workers obtain a more accurate picture of pork price trends. However, resource limitations and time constraints pose challenges to researchers who then face difficulties in developing feasible and effective prediction models. While utilizing the convolutional layer of the neural network to extract price features can lead to high accuracy, this comes at the cost of huge computational resources [43]. Furthermore, running on GPU incurs significant equipment costs, and the training duration increases due to the number of iterations; these challenges result in a notable depletion of both human and material resources. If the traditional machine learning method is used to predict the price of the CPU, researchers are in no need of additional investment or equipment cost, and the CPU has a wider range of compatibility and flexibility, simple operation, and less power consumption; it is greatly convenient for pork-related practitioners to use.

At present, scholars in related fields have not deeply integrated the intelligent optimization algorithm SSA and decision tree for pork price prediction. Therefore, based on the above investigation and analysis, this paper firstly combines the search advantages of heuristic algorithms with the high interpretability of the decision tree model, and further proposes a new swarm intelligent optimization technology, namely SSA-CART (Sparrow Search Algorithm and Classification and Regression Trees). Secondly, Pearson was used to calculate the correlation between the influencing factors of pork prices, and the visual display was carried out. Thirdly, the hyperparameters of the decision tree model in the field of artificial intelligence were optimized. In this paper, the accuracy of the decision tree optimized via the SSA algorithm is significantly improved, and this improvement has a guiding effect on in-depth cognition of the price change trend of small-scale agricultural products and stabilizing the agricultural products’ prices (corn, wheat, etc.).

3. Methodology

With the rapid development of machine learning technology, its application has become increasingly widespread. In regression prediction problems, decision trees (CART) are widely used due to their simplicity and efficiency in solving complex real-life problems. However, the performance of the CART model depends on the selection of model hyperparameters. Manual pruning techniques are subjective, requiring an amount of time and expensive costs. Moreover, manual operations also lack certainty, unable to guarantee the discovery of globally optimal hyperparameters for decision trees. The traditional optimization algorithm of a decision tree mainly includes pre-pruning and post-pruning. The pre-pruning method cannot grasp the global information of the tree and has certain blindness, and this method involves difficulty to determine whether the child nodes of clipped nodes have existence value, which may cause the decision tree to stop growing prematurely, and thus it cannot obtain the optimal decision tree. In addition, the post-pruning algorithm is usually based on statistical knowledge, and some of its parameters either depend on prior statistical laws, the domain knowledge of experts, or certain assumptions, and often need to go through repeated comparative tests to obtain satisfactory results [44].

Compared to other machine learning models, using decision trees for regression prediction of pork prices not only allows for an intuitive visualization of the price prediction process but also does not necessitate the use of complex computing power. The model can be efficiently computed on standard hardware at lower computational costs. Considering the drawbacks of traditional manual hyperparameter tuning, we adopted intelligent optimization algorithms to optimize the decision tree regression algorithm. Currently, the Sparrow Search Algorithm (SSA) is known for its rapid convergence and reliable stability in finding the optimal parameters through a discoverer, follower, and alert. It has emerged as a research hotspot in the field of algorithm optimization. The SSA algorithm is based on the behavior of a sparrow population to solve the optimization problem of the objective function. The principle of the decision tree is to generate trees in different positional spaces, while the SSA algorithm precisely searches for optimal positions until the optimal solution to the optimization problem is found. Based on this, we propose a Sparrow Search Algorithm to optimize the CART model for predicting pork prices. We use the MinLeafSize of the decision tree as the input for the Sparrow Search Algorithm, and use the mean squared error as the maximum fitness value to determine the optimal value of the MinLeafSize. By doing so, we obtained the optimal MinLeafSize for the CART model, and improved accuracy for predicting pork prices. Our research and development process is shown in Figure 2.

Figure 2 shows the specific details of the CART model and the SSA algorithm; the SSA-CART model proposed in this paper involves using the searchability of the SSA algorithm to search for the optimal solution of the hyperparameter “MinLeafSize” in the CART model. The search result was then returned to the CART model, and the search result was validated using the test set. At last, the predictive performance of the model was evaluated using the model’s evaluation indicators. The major advantage of this approach is that it eliminates the need for manual intervention as the SSA algorithm directly finds the global optimum. As a result, it modifies the traditional methods of pre-pruning and post-pruning optimization in decision trees and incorporates intelligent optimization algorithms. The decision tree optimized with intelligent optimization algorithms has been shown to significantly reduce the time and effort required while improving the predictive performance of the model.

A decision tree has many parameters, and a decision tree is prone to overfitting, by employing the SSA optimization algorithm to optimize the hyperparameters of a decision tree, and subsequently incorporating the optimal hyperparameters into the model for prediction to improve the generalization ability of the decision tree. Consequently, the CART model acquires a notable capacity for adaptation and robustness, leading to further enhancement in the accuracy of the pork price prediction model.

3.1. Decision Tree Regression

Since artificial intelligence technology has advanced, decision trees have evolved as a frequently utilized machine learning method. They are widely used in the medical area, supporting clinicians in diagnostic and treatment plan formulation [45,46]. Decision trees are used in the financial sector for risk assessment and credit evaluation. Decision tree regression is also widely used in industrial production to anticipate product sales, optimize production processes, and detect production management concerns such as defects [47,48,49,50,51].

When it comes to logistics and supply chain management, decision tree regression assists with demand forecasting, inventory optimization, and estimating traffic flow [52,53]. For sales and marketing, decision tree regression proves valuable in market segmentation, hotspot analyses, sales forecasting, and other decision-making aspects. Environmental science also benefits from decision tree regression, such as air quality predictions, water quality assessment, climate change, and sports analyses [54,55].

Decision tree regression is a kind of regression method based on the decision tree algorithm, which is used to predict continuous target variables [56]. It creates a predictive model by dividing the feature space into rectangular areas, and at each leaf node, the prediction result is derived using the mean of the samples in the relevant region [57]. The pros and cons of the decision tree algorithm are illustrated in Table 2.

3.2. Sparrow Search Algorithm

The Sparrow Search Intelligence Sparrow Search Algorithm is a novel swarm intelligence optimization algorithm that was presented in 2020. It is primarily motivated by the foraging and anti-predatory actions of sparrows [58]. The SSA optimization algorithm is ideally suited for application in artificial intelligence machine learning models.

Several investigations have been conducted on improving prediction models using the SSA algorithm. With the aim of enhancing the precision of fall detection, ref. [59] built a fall prediction model based on the SSA-BP neural network, adjusted it to pick the appropriate number of sliding windows, and attained the accurate fall detection process. In an attempt to enhance oil recovery, ref. [60] constructed an SSA-ANN model, utilizing SSA, and the weights, biases, and hyperparameters of the ANN model were optimized; their experimental results generate a significant enhancement in accuracy in terms of oil recovery [61,62,63]. Ref. [64] proposed the SSA-LSTM model, which employs SSA for waveform fitting and achieves an accuracy of more than 95% for each LSTM network. As a result, multiple studies have indicated the efficiency of SSA in improving the model’s hyperparameters.

The primary notion of the SSA algorithm is as follows:

(1): Sparrow foraging behavior simulations: When foraging, sparrows migrate in various areas and directions in search of food. The SSA algorithm represents the solution space of an optimization problem as a distribution of food, where individual organisms search for optimal solutions, resembling the foraging behavior of sparrows in search of food.
(2): Individual sparrows and pops: Through their unique actions and interactions with others, every individual symbolizes a solution and looks for the best one. There are plenty of individuals in the pops who cooperate to enhance search results.
(3): Flight and position updates: In each iteration, each individual flies according to its position and speed, and the direction and distance of the flight are impacted by individual and global optimal solutions. The individual’s position will be modified based on the outcome of the flight, which will progressively approach the best solution. To come closer to the ideal solution, the location of the individual is adjusted to reflect the outcomes of the flight.
(4): Adaptation assessments: The value of the objective function is used to assess each individual’s fitness. The higher the fitness value, the closer the individual is to the most appropriate response.
(5): Knowledge transfer and updates: Individuals monitor the actions of others’ locations and fitness values to try to update their behavior in an attempt to broaden their search through knowledge transmission and updating.

In the sparrow algorithm, the basic parameters such as the number of iterations needed for the population and the population size are specified, and the proportion of discoverers is also set. Assuming the number of sparrows in the population is N, the dimension of the optimal solution to be searched is D, the position of each sparrow is

X = (x_{1}, x_{2}, \dots, x_{D})

, and the fitness is

f_{i} = f (x_{1}, x_{2}, \dots, x_{D})

, then the initialized population can be represented as

X = (\begin{matrix} x_{1}^{1} & x_{1}^{2} & \dots & x_{1}^{D} \\ x_{2}^{1} & x_{2}^{2} & \dots & x_{2}^{D} \\ ⋮ & ⋮ & ⋮ \\ x_{N}^{1} & x_{N}^{2} & \dots & x_{N}^{D} \end{matrix})

The SSA algorithm comprises three components: a discoverer, follower, and alert. Using the mean square error between predicted and true values as the fitness value, the ratio of sparrow discoverers and followers is updated. Then, the iteration updates are carried out according to the position update rules of each functional population, which adjust their locations based on the principles listed below.

Discoverer

Discoverers, as individuals in the population who find better food, are responsible for providing guidance to their followers. Select the top PN sparrows with the best fitness values as the discoverers in each generation; they account for 20%. The formula for updating their position is as follows:

x_{i, j}^{t + 1} = \{\begin{array}{l} X_{i, j}^{t} \times e x p (- \frac{i}{α \times T}) i f R_{2} < S T \\ X_{i, j}^{t} + Q \times L i f R_{2} \geq S T \end{array}

(1)

Of these, x is the position of the ith sparrow in dimension j of the search space at iteration t; i is the sparrow number, t; T is the maximum number of iterations regarding random number α ∊ (0, 1], warning values R₂ ∊ [0, 1], and ST ∊ [0.5, 1] for safety values; Q is the random value; and L is a matrix of dimension 1 × d with all elements as 1. When R₂ < ST, there are no predators in the foraging environment, and finders conduct extensive searches in the area; when R₂ ≥ ST, scouts detect the presence of predators, and the group moves rapidly toward a safe area.

Follower

Except for the discoverers, all remaining N-PN individuals serve as followers, accounting for 80%, and their position update formula is as follows:

x_{i, j}^{t + 1} = \{\begin{array}{l} Q \times e x p (\frac{X_{w o r s t}^{t} - x_{i, j}^{t}}{t^{2}}) i f i > 0.5 n \\ X_{p}^{t + 1} + |x_{i, j}^{t} - X_{p}^{t + 1}| \times L \times A + i f i \leq 0.5 n \end{array}

(2)

x_{i, j}^{t + 1}

is the location with optimal adaptation controlled via the discoverer at the t + 1st iteration,

X_{w o r s t}^{t}

is the global worst location,

A^{+}

is a matrix randomly assigned to 1 or −1 with dimension 1 × d, and

A^{+}

= (

{A A}^{T}

) − 1. When i > 0.5n, the ith accession that fails to acquire food and has a low energy level needs to go to other areas to forage; when i ≤ 0.5n, the ith accession will follow the discoverer’s foraging center and randomly forage near the center position.

Alert

Selecting a certain number of alerts to perform reconnaissance and early warning tasks and giving up food when facing danger are involved, assuming each generation randomly selects SD sparrows for reconnaissance and warning. The formula for updating its position is as follows:

x_{i, j}^{t + 1} = \{\begin{matrix} X_{b e s t}^{t} + β \times |x_{i, j}^{t} - X_{b e s t}^{t}| i f f_{i} > f_{g} \\ x_{i, j}^{t} + K \times (\frac{x_{i, j}^{t} - X_{w o r s t}^{t}}{{(f}_{i} - f_{w}) ε}) i f f_{i} = f_{g} \end{matrix}

(3)

X_{b e s t}^{t}

is the global optimal position;

β

is a step control parameter, a random number of K ∊ [−1, 1];

f_{i}

is the fitness value of the ith individual sparrow; f_g and f_w are the best and worst fitness values; and

ε

is a minimal constant. When f_i > f_g, sparrows are at the edge of the population and vulnerable to predators; when f_i = f_g, sparrows are at the center of the population and randomly gravitate toward other sparrows; and when f_i < f_g, a scout is not moving.

The SSA algorithm relies on the cooperative collaboration of the group, following the optimal individual for iteration, and making optimal judgments through the fitness function. After the completion of the update process for the number of Miter, the final optimal solution is output and substituted into the position of the MinLeafSize parameter in the CART model for testing.

3.3. SSA-CART Model

The CART model is used to describe pork pricing in this research because it is classic and powerful, simple, and logically rigorous. More importantly, the CART model has a high explanatory power. If an issue has been identified in the decision tree, the decision tree is highly susceptible to overfitting in the field of machine learning, which results in poor generalization ability. Due to the large number of parameters that must be integrated and adjusted, it is very tricky to artificially adjust the parameter. Therefore, an optimization algorithm was selected to optimize the hyperparameters of the decision tree, and minimized the model’s overfitting predicament; the use of the SSA-CART model resulted in a reduction in price prediction model error and an increase in accuracy. The model workflow is shown in Figure 3.

The detailed optimization steps for SSA-CART are as follows:

Z-score normalization of the dataset;
Defined the hyperparameter optimization range and lower bound (LU) to upper bound (BU), and initialized the population number pop and the maximum number of iterations;
Divided the dataset into the training set, validation set, and test set, and put the training set into the decision tree model to train;
Calculated the MAE between the true price of the validation set and the predicted price of the training set, and performed a fitness assessment.

Its formula is the following:

The trained model is F_CART and the hyperparameters are

θ

; the model is trained from the training data (

X_{t r a i n}^{'}, y_{t r a i n}^{'}

).

F_{C A R T} = f (X_{t r a i n}^{'}, y_{t r a i n}^{'}, θ)

(4)

The fitness function is computed by setting the trained model prediction to obtain the value of

{\overset{ˇ}{y}}^{'}_{v a i l d}

.

{\overset{ˇ}{y}}^{'}_{v a i l d} = F_{C A R T} (X_{v a i l d}^{'})

(5)

f i t n e s s = \frac{1}{N} \sum |{\overset{ˇ}{y}}^{'}_{v a i l d} - y_{v a i l d}^{'}|

(6)

Whenever the maximum number of iterations has not been reached, customize the hyperparameters and train the model until the maximum number of iterations is attained;
Designed the CART prediction model using the optimum hyperparameters acquired during the search;
Evaluated the regression prediction results of four models, and the equations are shown below.

M A E = \frac{1}{m} \sum_{i = 1}^{m} |y_{i} - \hat{y_{i}}|

(7)

M A P E = M A E / |\hat{y}|

(8)

M S E = \frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - \hat{y_{i}})}^{2}

(9)

R M S E = \sqrt{M S E}

(10)

R^{2} = 1 - \frac{\sum_{i = 1}^{m} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{m} {(y_{i} - \bar{y_{i}})}^{2}}

(11)

where y_i is the true value, and

\hat{y_{i}}

is the predictive value.

4. Dataset Introduction

Continuous exploration of pork prices is not only conducive to the regulatory authorities to grasp the market, and maximize the interests of pig enterprises and farmers, but also provides a reference for price control.

4.1. Data Sources

This dataset (https://github.com/echo-wen/Pig-Meat-Sichuan, accessed on 25 August 2023) is from the 2016 Sichuan Innovation Competition [65]. The dataset collected some information on pork, feed, and manpower in Sichuan from 2011 to 2015. Among them, it included variables such as year, month, week, the average price of pork, the average price of the piglet, and the average price of the sow. Table 3 shows descriptive statistics of the variables in this dataset.

4.2. Data Processing

In this experiment, the dataset was split according to the conventional way of an 8:1:1 ratio, and the detailed information is shown in Table 4.

4.3. Pearson Correlation Coefficient

The correlation coefficient is a statistical metric used to quantify the degree of association between two factors, exploring the relationship between two variables to facilitate figuring out the patterns of change. The Pearson correlation coefficient is a value between −1 and 1, with values close to 1 indicating a positive correlation, values close to −1 indicating a negative correlation, and values close to 0 indicating a weaker or no correlation between the two factors.

The Pearson correlation coefficient is calculated as follows:

r h o (a, b) = \frac{\sum_{i = 1}^{m} (x_{a, i} - \bar{x_{a}}) (x_{a, i} - \bar{x_{b}})}{\sum_{i = 1}^{m} {(x_{a, i} - \bar{x_{a}})}^{2} \sum_{j = 1}^{m} {(x_{b, i} - \bar{x_{b}})}^{2}}

(12)

where a and b are columns.

Figure 4 shows pork data correlation via a heat map. As observed in Figure 4, there is a weak negative association between the price of pork and the price of wheat bran; if the price of wheat bran rises, the price of pork may fall. There is the strongest positive correlation between the average price of hogs slaughtered and the pork prices, which reached 0.98; respectively, the price of pork will frequently rise when the price of hogs slaughtered rises.

5. Experimental Details

The computer operating system for this experiment is Windows10, the processor is Intel(R) Core(TM) i7-8700 CPU @ 3.20 GHz, the RAM is 16.0 GB, the programming software is MATLAB 2022b and Python 3.11.3, the virtual environment is Anaconda 23.3.1, and the Image Visualization Tool is Graphviz version 5.0.1.

5.1. Experimental Procedure

First, regarding data preprocessing, where data are standardized to scale the values of different features to the same range, the data normalization is aimed at eliminating the influence of dimensions and accelerating the training speed and convergence speed. By scaling the weights of the same feature within the same scale range, the fairness of the model is ensured, and the interpretability of the features is improved. In this report, we used the Z-score standardization method. Our input variables include Year (X1), Month (X2), Week (X3), Average price of hogs at slaughter (X4), Average price of piglets (X5), Average sow price (X6), Corn (X7), Wheat bran (X8), and Fattening pig compound (X9), and the output variable is Pork prices (Y).

(1): Calculate the mean of the original data

μ = \frac{1}{N} \sum_{i = 1}^{N} x_{i}

(13)

(2): Calculate the Standard Deviation

σ = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - μ)}^{2}}

(14)

(3): Z-score standardization

z_{i} = \frac{x_{i} - μ}{σ}

(15)

Secondly, feature selection occurs. In regression problems, compared to other machine learning models, the decision tree model has the greatest advantage of having its own feature selection. Therefore, in our study, we did not choose other feature engineering methods for feature selection. Instead, we used mean squared error (MSE) between the parent and leaf nodes as the criterion for feature selection.

Thirdly, the first step is to set some basic parameters of the optimization algorithm when using the SSA algorithm to optimize the CART model, and we set the population size of the SSA algorithm to 50, the maximum number of iterations to 100, the hyperparameters’ LU to 0, and the BU to 30; dimension space D is 2. Furthermore, we fitted the Fitrtree function to the decision tree regression model and optimized the hyperparameter of “MinLeafSize”. Next, we calculated the partitioned validation set true data and the training set forecast values, and returned the population with the highest fitness. If the maximum number of iterations is not reached, the hyperparameter is updated and trained again, until the maximum number of iterations is reached. At last, the optimal hyperparameter ‘MinLeafSize’ was returned with a value of four.

Finally, the model is trained using the optimized hyperparameters and test set. The evaluation metrics MAE, MAPE, MSE, RMSE, and R₂ are utilized to assess the feasibility and efficacy of the SSA-CART model. Compare the predictions of a single decision tree and the optimized decision tree of the well-known SA algorithm and WSO algorithm; the predicted results are shown in Table 5.

According to Table 5, the outcomes for each model are provided. As can be observed, when merely utilizing the conventional decision tree model, the test results for R₂ are 7.36 × 10⁻¹, which is not ideal. However, after using other optimization techniques, the MAE, MAPE, MSE, and RMSE are significantly decreased, and the R₂ has somewhat improved.

5.2. Comparative Experiment

In this experiment, the results of the SA algorithm and WSO algorithm optimization models are not significantly different and smaller than the SSA-optimized decision tree model; on the contrary, the decision tree optimized with the SSA algorithm shows high prediction accuracy and is much better than the traditional decision tree model; its R₂ is significantly improved, and the prediction accuracy is improved by 9.236%; MAE is down 8.65 × 10⁻², MAPE is down 3.21 × 10⁻³, MSE is down 1.04 × 10⁻¹, and RMSE is down 1.0581 × 10⁻¹. It can also be seen that the SSA-CART optimization model is more evident than the WSO-CART model and the SA-CART optimization model. Additionally, it is evident that the SA optimization model is better than the WSO optimization model, but less successful than the SSA (Sparrow Search Algorithm) optimization.

The MSE of the SSA algorithm decreased by 3.58 × 10⁻² compared to the WSO algorithm, and the MSE of the WSO algorithm decreased by 4.94 × 10⁻³ compared to the SA algorithm. The specific reduction indicators for the parameters are shown in Figure 5.

Regarding the indicators displayed in Figure 5, the figure demonstrates that the SSA optimization model has an outstanding level of the fit and is accurate in predicting pork prices. Figure 6 depicts the decision tree coefficients of various models as well as the prediction time of each model, and it can be seen that after using different optimization algorithms, the prediction effect of the decision tree is well improved. Using optimization algorithms to optimize the decision tree sacrifices time but its accuracy has been improved, in which the accuracy of the pork price prediction model after optimization with the use of the SSA is significantly improved.

It is also worth noting that the WSO optimization model required the least amount of time to train the model, taking just 0.074 s longer than the time required by a single decision tree. Once more, the WSO optimization model requires 0.05 s of less time than the SSA optimization approach; although the SSA optimization takes up an additional 50 ms, the outcome is very logical, and the value of the minimal leaf node tree is five in the SSA optimization model, which considerably decreases the danger of overfitting and gives the decision tree more powerful regression capacity for improved prediction. Table 6 diction shows the values of hyperparameters optimized with different optimization methods.

Figure 7 and Figure 8 exhibit comparison charts of predicted values and real values during the testing phase; the values predicted by a single decision tree are not particularly close to the true values, and the algorithms optimized using the SA algorithm and the WSO algorithm are more accurate, with the WSO-optimized CART model being the most accurate.

In the combined models, the Sparrow Search Algorithm in this study performs the best of all models, achieving the closest approximation to the real values; the predictive power of the optimized decision tree outperforms that of individual decision trees.

5.3. Discussion of Result

The purpose of decision tree learning is to generate a tree with strong generalization ability. A root node, several internal nodes, and some leaf nodes make up a decision tree, and the leaf nodes correspond to the decision outcomes, while the root node contains the entire sample set [66].

Decision tree regression is likewise based on the ‘Things of one kind come together’ premise [67], and the decision route moves to the left if the condition is less than or equal to the threshold; otherwise, it moves to the right. Figure 9 shows the tree structure of an unoptimized decision tree in the pork price prediction model.

As Figure 9 shows, a full binary tree has 51 nodes, 26 leaf nodes, and 25 internal nodes. It can be seen that the unoptimized decision tree separates every data point (the more nodes, the darker the color).

The decision tree optimized using the SSA algorithm is a full binary tree with two layers and seven nodes, comprising one root node, three internal nodes, and four leaf nodes. Since decision tree models are prone to overfitting, users must limit the potential risk of overfitting by modifying the model parameters. Figure 10 visually presents the decision tree model, and the decision tree model was enhanced through the use of several intelligence algorithms; it can be observed that adding optimization methods to the decision tree can greatly lower the risk of overfitting and make the model better at predicting the future.

Figure 10a shows the visualization results after using the SSA optimization algorithm; the left side is the tree generated by the CART regression, and the right side is the regression tree corresponding to the number of samples of the visualization of the regression results; through the right side is the most intuitive way to be able to more easily understand the process of regression of the CART prediction of pork price data. The optimization hyperparameter of the minimum number of leaf nodes is five, which means that the terminal nodes (leaf nodes) of the tree must have at least five data instances associated with them. In other words, it imposes a constraint on the growth of the tree such that if a branch results in a leaf node having less than five instances, the branch will not be split further; the purpose of setting the minimum leaf node size is to prevent overfitting and to control the complexity of the decision tree model; it is obvious that the amounts of data samples are 5, 7, 5, and 9, respectively, and the darker the color means the more data; from the left side of the first leaf node, the data in the leaf node can be seen, and squared_error = 1.121 means a (mean squared error) value of 1.121.

Squared_error (MSE) is a metric used to measure the degree of difference between predicted and true values. It calculates the difference between each predicted value and its corresponding true value, squares it, and takes the average over all samples. A smaller MSE indicates a more accurate model prediction. In this case, the mean value is 25.892 in these five samples. In Figure 10, the other leaves can be referred to in the same way as in subfigure (a), based on the same principle.

6. Conclusions

In this paper, aiming at how to accurately predict weekly pork price, we proposed combining an intelligent optimization algorithm and machine learning model to build a pork price prediction model based on the SSA algorithm and the CART regression models. We offered a useful recommendation regarding overfitting decision trees, we assumed that the global optimal search capability of the Sparrow Search Algorithm would be able to search for the optimal parameters for the decision tree, and we proved the assumption through our experiments. The adoption of the SSA algorithm to optimize hyperparameters effectively reduced the overfitting phenomenon of the decision tree model, avoided the regional optimal situation, and enhanced the CART model ability of generalization and robustness. To further explain why the SSA search algorithm is the best optimization method for the decision tree pork price prediction model, we examined the performance of the model optimized via the SSA algorithm, the SA algorithm, and the WSO optimization algorithm; the results indicate that the prediction performance of the SA-algorithm- and WSO-algorithm-optimized models is not particularly excellent. Instead, the CART model optimized by the SSA algorithm has demonstrated efficiency in accurately forecasting fluctuations in pork prices.

Overall, this study makes an important contribution to address the limitations of a traditional decision tree and poor interpretability in prediction models. In accurately predicting pork prices, the SSA-CART model demonstrates promising application prospects in improving machine learning model performance and offers a useful idea for future research in this area.

In the future, more pork data will be evaluated using the SSA-CART model, and we plan to employ a multi-perspective learning technique to combine the elements affecting pork price fluctuations, such as prices of beef, chicken, and lamb, and other meat prices [68]. On the other hand, we plan to investigate how to combine the intelligent optimization algorithm and other machine learning techniques more efficiently and effectively.

Author Contributions

J.Q. implemented all proposed methods and conducted the experiments. D.Y. and W.Z. oversaw the study and contributed to the editing and review of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Scientific and Technological Research Program of Chongqing Municipal Education Commission, grant number: KJZD-M202300502.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in article.

Acknowledgments

We would like to express our heartfelt appreciation to the funding Scientific and Technological Research Program of the Chongqing Municipal Education Commission which supported our research and provided financial assistance for this article. Without their generous support, this study would not have been possible. The financial support provided by the Project has been instrumental in conducting our research, analyzing the data, and disseminating the findings. We are truly grateful for their investment in our work and their commitment to advancing scientific knowledge in our field.

Conflicts of Interest

The authors declare no conflict of interest.

References

Barkan, O.; Benchimol, J.; Caspi, I.; Cohen, E.; Hammer, A.; Koenigstein, N. Forecasting CPI Inflation Components with Hierarchical Recurrent Neural Networks. Int. J. Forecast. 2023, 39, 1145–1162. [Google Scholar] [CrossRef]
Soliman, A.M.; Lau, C.K.; Cai, Y.; Sarker, P.K.; Dastgir, S. Asymmetric Effects of Energy Inflation, Agri-Inflation and CPI on Agricultural Output: Evidence from NARDL and SVAR Models for the UK. Energy Econ. 2023, 126, 106920. [Google Scholar] [CrossRef]
Liu, Q.W. Price Relations among Hog, Corn, and Soybean Meal Futures. J. Future Mark. 2005, 25, 491–514. [Google Scholar] [CrossRef]
Li, J.; Liu, W.; Song, Z. Sustainability of the Adjustment Schemes in China’s Grain Price Support Policy—An Empirical Analysis Based on the Partial Equilibrium Model of Wheat. Sustainability 2020, 12, 6447. [Google Scholar] [CrossRef]
Vu, T.N.; Ho, C.M.; Nguyen, T.C.; Vo, D.H. The Determinants of Risk Transmission between Oil and Agricultural Prices: An IPVAR Approach. Agriculture 2020, 10, 120. [Google Scholar] [CrossRef]
Lin, F.; Li, X.; Jia, N.; Feng, F.; Huang, H.; Huang, J.; Fan, S.; Ciais, P.; Song, X.-P. The Impact of Russia-Ukraine Conflict on Global Food Security. Glob. Food Secur. 2023, 36, 100661. [Google Scholar] [CrossRef]
El Montasser, G.; Belhoula, M.M.; Charfeddine, L. Co-Explosivity versus Leading Effects: Evidence from Crude Oil and Agricultural Commodities. Resour. Policy 2023, 81, 103331. [Google Scholar] [CrossRef]
Htun, H.H.; Biehl, M.; Petkov, N. Survey of Feature Selection and Extraction Techniques for Stock Market Prediction. Financ. Innov. 2023, 9, 26. [Google Scholar] [CrossRef]
Jelić Milković, S.; Lončarić, R.; Kralik, I.; Kristić, J.; Crnčan, A.; Djurkin Kušec, I.; Canavari, M. Consumers’ Preference for the Consumption of the Fresh Black Slavonian Pig’s Meat. Foods 2023, 12, 1255. [Google Scholar] [CrossRef]
García Márquez, F.P.; Peinado Gonzalo, A. A Comprehensive Review of Artificial Intelligence and Wind Energy. Arch. Comput. Methods Eng. 2022, 29, 2935–2958. [Google Scholar] [CrossRef]
Wang, J.; Zhu, S. A Multi-Factor Two-Stage Deep Integration Model for Stock Price Prediction Based on Intelligent Optimization and Feature Clustering. Artif. Intell. Rev. 2023, 56, 7237–7262. [Google Scholar] [CrossRef]
Ismail, W.N.; Alsalamah, H.A.; Hassan, M.M.; Mohamed, E. AUTO-HAR: An Adaptive Human Activity Recognition Framework Using an Automated CNN Architecture Design. Heliyon 2023, 9, e13636. [Google Scholar] [CrossRef] [PubMed]
Lu, W.; Rui, H.; Liang, C.; Jiang, L.; Zhao, S.; Li, K. A Method Based on GA-CNN-LSTM for Daily Tourist Flow Prediction at Scenic Spots. Entropy 2020, 22, 261. [Google Scholar] [CrossRef] [PubMed]
Özdem, K.; Özkaya, Ç.; Atay, Y.; Çeltikçi, E.; Börcek, A.; Demirezen, U.; Sağıroğlu, Ş. A Ga-Based Cnn Model for Brain Tumor Classification. In Proceedings of the 2022 7th International Conference on Computer Science and Engineering (UBMK), Diyarbakir, Turkey, 14–16 September 2022; pp. 418–423. [Google Scholar]
Sadr, M.A.M.; Zhu, Y.; Hu, P. Multivariate Variance-Based Genetic Ensemble Learning for Satellite Anomaly Detection. IEEE Trans. Veh. Technol. 2023, 72, 14155–14165. [Google Scholar] [CrossRef]
Liang, H.; Jiang, K.; Yan, T.-A.; Chen, G.-H. XGBoost: An Optimal Machine Learning Model with Just Structural Features to Discover MOF Adsorbents of Xe/Kr. ACS Omega 2021, 6, 9066–9076. [Google Scholar] [CrossRef] [PubMed]
Parmezan, A.R.S.; Souza, V.M.; Batista, G.E. Evaluation of Statistical and Machine Learning Models for Time Series Prediction: Identifying the State-of-the-Art and the Best Conditions for the Use of Each Model. Inf. Sci. 2019, 484, 302–337. [Google Scholar] [CrossRef]
Gen, M.; Lin, L. Genetic Algorithms and Their Applications. In Springer Handbook of Engineering Statistics; Pham, H., Ed.; Springer Handbooks; Springer: London, UK, 2023; pp. 635–674. ISBN 978-1-4471-7502-5. [Google Scholar]
Kuo, R.J.; Li, S.-S. Applying Particle Swarm Optimization Algorithm-Based Collaborative Filtering Recommender System Considering Rating and Review. Appl. Soft Comput. 2023, 135, 110038. [Google Scholar] [CrossRef]
Huang, W.; Xu, J. Particle Swarm Optimization. In Optimized Engineering Vibration Isolation, Absorption and Control; Springer: Berlin/Heidelberg, Germany, 2023; pp. 15–24. [Google Scholar]
Liu, Y.; Heidari, A.A.; Cai, Z.; Liang, G.; Chen, H.; Pan, Z.; Alsufyani, A.; Bourouis, S. Simulated Annealing-Based Dynamic Step Shuffled Frog Leaping Algorithm: Optimal Performance Design and Feature Selection. Neurocomputing 2022, 503, 325–362. [Google Scholar] [CrossRef]
Zhou, X.; Ma, H.; Gu, J.; Chen, H.; Deng, W. Parameter Adaptation-Based Ant Colony Optimization with Dynamic Hybrid Mechanism. Eng. Appl. Artif. Intell. 2022, 114, 105139. [Google Scholar] [CrossRef]
Chakraborty, S.; Saha, A.K.; Chakraborty, R.; Saha, M. An Enhanced Whale Optimization Algorithm for Large Scale Optimization Problems. Knowl.-Based Syst. 2021, 233, 107543. [Google Scholar] [CrossRef]
Li, Z.M.; Xu, S.W.; Cui, L.G.; Li, G.Q.; Dong, X.X.; Wu, J.Z. The Short-Term Forecast Model of Pork Price Based on CNN-GA. Adv. Mater. Res. 2013, 628, 350–358. [Google Scholar] [CrossRef]
Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G.; Lv, J. Automatically Designing CNN Architectures Using the Genetic Algorithm for Image Classification. IEEE Trans. Cybern. 2020, 50, 3840–3854. [Google Scholar] [CrossRef] [PubMed]
Ishaq, A.; Asghar, S.; Gillani, S.A. Aspect-Based Sentiment Analysis Using a Hybridized Approach Based on CNN and GA. IEEE Access 2020, 8, 135499–135512. [Google Scholar] [CrossRef]
Pan, B.; Song, X.; Xu, J.; Sui, D.; Xiao, H.; Zhou, J.; Gu, J. Accelerated Inverse Design of Customizable Acoustic Metaporous Structures Using a CNN-GA-Based Hybrid Optimization Framework. Appl. Acoust. 2023, 210, 109445. [Google Scholar] [CrossRef]
Liu, Y.; Duan, Q.; Wang, D.; Zhang, Z.; Liu, C. Prediction for Hog Prices Based on Similar Sub-Series Search and Support Vector Regression. Comput. Electron. Agric. 2019, 157, 581–588. [Google Scholar] [CrossRef]
Billings, T.A. Psychoacoustical Dissonance as a Tool for Musical Analysis. 2023. Available online: https://adambillings.org/essays/Billings%20-%20Psychoacoustical%20Dissonance%20as%20a%20Tool%20for%20Musical%20Analysis.pdf (accessed on 8 October 2023).
Paparoditis, E.; Shang, H.L. Bootstrap Prediction Bands for Functional Time Series. J. Am. Stat. Assoc. 2023, 118, 972–986. [Google Scholar] [CrossRef]
Wang, H.; Li, G.; Wang, Z. Fast SVM Classifier for Large-Scale Classification Problems. Inf. Sci. 2023, 642, 119136. [Google Scholar] [CrossRef]
Duan, Q.; Zhang, L.; Wei, F.; Xiao, X.; Wang, L. Forecasting Model and Validation for Aquatic Product Price Based on Time Series GA-SVR. Trans. Chin. Soc. Agric. Eng. 2017, 33, 308–314. [Google Scholar]
Chen, L.; Wu, T.; Wang, Z.; Lin, X.; Cai, Y. A Novel Hybrid BPNN Model Based on Adaptive Evolutionary Artificial Bee Colony Algorithm for Water Quality Index Prediction. Ecol. Indic. 2023, 146, 109882. [Google Scholar] [CrossRef]
Chuluunsaikhan, T.; Yoo, K.-H.; Rah, H.; Nasridinov, A. Pork Price Prediction Using Topic Modeling and Feature Scoring Method. In Advances in Intelligent Information Hiding and Multimedia Signal Processing; Pan, J.-S., Li, J., Ryu, K.H., Meng, Z., Klasnja-Milicevic, A., Eds.; Smart Innovation, Systems and Technologies; Springer: Singapore, 2021; Volume 212, pp. 277–282. ISBN 978-981-336-756-2. [Google Scholar]
Ye, K.; Piao, Y.; Zhao, K.; Cui, X. A Heterogeneous Graph Enhanced LSTM Network for Hog Price Prediction Using Online Discussion. Agriculture 2021, 11, 359. [Google Scholar] [CrossRef]
Zhao, S.; Lin, X.; Weng, X. A Method for Forecasting The Pork Price Based on Fluctuation Forecasting and Attention Mechanism. In Proceedings of the 2022 International Conference on Machine Learning and Cybernetics (ICMLC), Toyama, Japan, 9–11 September 2022; pp. 18–24. [Google Scholar]
Dabin, Z.; Chaomin, C.A.I.; Liwen, L.; Shanying, C. Pork Price Ensemble Prediction Model Based on CEEMD and GA-SVR. J. Syst. Sci. Math. Sci. 2020, 40, 1061. [Google Scholar]
Singh, N.; Tanwar, S.; Kumar, P.; Sharma, A.L.; Yadav, B.C. Advanced Sustainable Solid State Energy Storage Devices Based on FeOOH Nanorod Loaded Carbon@ PANI Electrode: GCD Cycling and TEM Correlation. J. Alloys Compd. 2023, 947, 169580. [Google Scholar] [CrossRef]
Fishman-Jacob, T.; Youdim, M.B.H. A Sporadic Parkinson’s Disease Model via Silencing of the Ubiquitin–Proteasome/E3 Ligase Component, SKP1A. J. Neural Transm. 2023, 8, 1–33. [Google Scholar] [CrossRef]
Masini, R.P.; Medeiros, M.C.; Mendes, E.F. Machine Learning Advances for Time Series Forecasting. J. Econ. Surv. 2023, 37, 76–111. [Google Scholar] [CrossRef]
Liu, C.; Tang, L.; Zhao, C. A Novel Dynamic Operation Optimization Method Based on Multiobjective Deep Reinforcement Learning for Steelmaking Process. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–15. [Google Scholar] [CrossRef] [PubMed]
Song, H.; Choi, H. Forecasting Stock Market Indices Using the Recurrent Neural Network Based Hybrid Models: CNN-LSTM, GRU-CNN, and Ensemble Models. Appl. Sci. 2023, 13, 4644. [Google Scholar] [CrossRef]
Li, J.; Wen, Y.; He, L. SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6153–6162. [Google Scholar]
Moses, W.S.; Ivanov, I.R.; Domke, J.; Endo, T.; Doerfert, J.; Zinenko, O. High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs. In Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, Montreal, QC, Canada, 25 February–1 March 2023; pp. 119–134. [Google Scholar]
Farid, M.; Palmblad, M.; Hallman, H.; Vänngård, J. A Binary Decision Tree Approach for Pharmaceutical Project Portfolio Management. Decis. Anal. J. 2023, 7, 100228. [Google Scholar] [CrossRef]
Zheng, Y.; Ding, J.; Liu, F.; Wang, D. Adaptive Neural Decision Tree for EEG Based Emotion Recognition. Inf. Sci. 2023, 643, 119160. [Google Scholar] [CrossRef]
Aguilera-Venegas, G.; Roanes-Lozano, E.; Rojo-Martínez, G.; Galán-García, J.L. A Proposal of a Mixed Diagnostic System Based on Decision Trees and Probabilistic Experts Rules. J. Comput. Appl. Math. 2023, 427, 115130. [Google Scholar] [CrossRef]
Hong, J.-S.; Lee, J.; Sim, M.K. Concise Rule Induction Algorithm Based on One-Sided Maximum Decision Tree Approach. Expert Syst. Appl. 2024, 237, 121365. [Google Scholar] [CrossRef]
Ammari, B.L.; Johnson, E.S.; Stinchfield, G.; Kim, T.; Bynum, M.; Hart, W.E.; Pulsipher, J.; Laird, C.D. Linear Model Decision Trees as Surrogates in Optimization of Engineering Applications. Comput. Chem. Eng. 2023, 178, 108347. [Google Scholar] [CrossRef]
Dahiya, Y.; Vignesh, K.; Mahajan, M.; Sreenivasaiah, K. Linear Threshold Functions in Decision Lists, Decision Trees, and Depth-2 Circuits. Inf. Process. Lett. 2024, 183, 106418. [Google Scholar] [CrossRef]
Chou, C.; Liu, Y.-H.; Yang, K.-P. Impacts of Strategic Exploitation and Exploration on Firms’ Survival Likelihood after Crises: A Decision-Tree Analysis. Long Range Plann. 2023, 102374. [Google Scholar] [CrossRef]
Díaz-Ramírez, J.; Estrada-García, J.; Figueroa-Sayago, J. Predicting Imbalanced Transport Mode Choice Preferences in a University District with Decision Tree-Based Models; Elsevier: Amsterdam, The Netherlands, 2023. [Google Scholar] [CrossRef]
Portoleau, T.; Artigues, C.; Guillaume, R. Robust Decision Trees for the Multi-Mode Project Scheduling Problem with a Resource Investment Objective and Uncertain Activity Duration. Eur. J. Oper. Res. 2024, 312, 525–540. [Google Scholar] [CrossRef]
Cao, Y.; Zhao, H.; Liang, G.; Zhao, J.; Liao, H.; Yang, C. Fast and Explainable Warm-Start Point Learning for AC Optimal Power Flow Using Decision Tree. Int. J. Electr. Power Energy Syst. 2023, 153, 109369. [Google Scholar] [CrossRef]
Hosney, H.; Tawfik, M.H.; Duker, A.; van der Steen, P. Prospects for Treated Wastewater Reuse in Agriculture in Low-and Middle-Income Countries: Systematic Analysis and Decision-Making Trees for Diverse Management Approaches. Environ. Dev. 2023, 46, 100849. [Google Scholar] [CrossRef]
Gifford, M.; Bayrak, T. A Predictive Analytics Model for Forecasting Outcomes in the National Football League Games Using Decision Tree and Logistic Regression. Decis. Anal. J. 2023, 8, 100296. [Google Scholar] [CrossRef]
Huang, X.; Zhou, F.; Niu, W.; Li, T.; Lu, Y.; Zhou, Y.; Yin, H.; Yan, C. Multi-Stage Affine Motion Estimation Fast Algorithm for Versatile Video Coding Using Decision Tree. J. Vis. Commun. Image Represent. 2023, 96, 103910. [Google Scholar] [CrossRef]
Sulandari, W.; Subanar, S.; Suhartono, S.; Utami, H.; Lee, M.H.; Rodrigues, P.C. SSA-Based Hybrid Forecasting Models and Applications. Bull. Electr. Eng. Inform. 2020, 9, 2178–2188. [Google Scholar] [CrossRef]
Wang, T.; Wang, B.; Shen, Y.; Zhao, Y.; Li, W.; Yao, K.; Liu, X.; Luo, Y. Accelerometer-Based Human Fall Detection Using Sparrow Search Algorithm and Back Propagation Neural Network. Measurement 2022, 204, 112104. [Google Scholar] [CrossRef]
Tabatabaei, S.M.; Attari, N.; Panahi, S.A.; Asadian-Pakfar, M.; Sedaee, B. EOR Screening Using Optimized Artificial Neural Network by Sparrow Search Algorithm. Geoenergy Sci. Eng. 2023, 229, 212023. [Google Scholar] [CrossRef]
Yao, Z.; Wang, Z.; Wang, D.; Wu, J.; Chen, L. An Ensemble CNN-LSTM and GRU Adaptive Weighting Model Based Improved Sparrow Search Algorithm for Predicting Runoff Using Historical Meteorological and Runoff Data as Input. J. Hydrol. 2023, 625, 129977. [Google Scholar] [CrossRef]
Fu, B.; Wang, W.; Li, Y.; Peng, Q. An Improved Neural Network Model for Battery Smarter State-of-Charge Estimation of Energy-Transportation System. Green Energy Intell. Transp. 2023, 2, 100067. [Google Scholar] [CrossRef]
Peng, T.; Fu, Y.; Wang, Y.; Xiong, J.; Suo, L.; Nazir, M.S.; Zhang, C. An Intelligent Hybrid Approach for Photovoltaic Power Forecasting Using Enhanced Chaos Game Optimization Algorithm and Locality Sensitive Hashing Based Informer Model. J. Build. Eng. 2023, 78, 107635. [Google Scholar] [CrossRef]
Xu, X.; Wang, J.; Wu, J.; Qu, Q.; Ran, Y.; Tan, Z.; Luo, M. Full-Waveform LiDAR Echo Decomposition Method Based on Deep Learning and Sparrow Search Algorithm. Infrared Phys. Technol. 2023, 130, 104613. [Google Scholar] [CrossRef]
Amankwah-Amoah, J.; Chen, X.; Wang, X.; Khan, Z.; Chen, J. Overcoming Institutional Voids as a Pathway to Becoming Ambidextrous: The Case of China’s Sichuan Telecom. Long Range Plann. 2019, 52, 101871. [Google Scholar] [CrossRef]
Costa, V.G.; Pedreira, C.E. Recent Advances in Decision Trees: An Updated Survey. Artif. Intell. Rev. 2023, 56, 4765–4800. [Google Scholar] [CrossRef]
Han, X.; Zhu, X.; Pedrycz, W.; Li, Z. A Three-Way Classification with Fuzzy Decision Trees. Appl. Soft Comput. 2023, 132, 109788. [Google Scholar] [CrossRef]
Noda, H.; Kyo, K. Dynamic Relationships among Changes in Prices of Beef, Pork, and Chicken in Japan: A Bayesian Approach. J. Agric. Food Res. 2023, 11, 100464. [Google Scholar] [CrossRef]

Figure 1. The overall framework of the study.

Figure 2. Relationships among the three models.

Figure 3. SSA-CART model workflow.

Figure 4. Study correlation coefficient matrix heat map.

Figure 5. Evaluation metrics.

Figure 6. Model decision coefficients and running times comparison. (a) Four model determination coefficients; (b) runtime display for the four models (s).

Figure 7. The prediction results of models on the test set. (a) CAR, (b) SSA-CART, (c) SA-CART, (d) WSO-CART.

Figure 8. Comparison of prices predicted with different models and real prices.

Figure 9. An unoptimized decision tree in the pork price prediction model.

Figure 10. Decision tree visualization after comparing three optimization algorithms (root node feature x[5], the average price of sows in the 6th column of the data table, squared_error (also called mean squared error), samples, and value). (a) SSA-CART Optimization Process; (b) SA-CART Optimization Process; (c) WSO-CART Optimization Process.

Table 1. Collection of relevant literature on pork price prediction.

Author	Model	Conclusion
LI et al. [25]	CNN-GA	To predict 1603 daily sample data values using the short-term forecast model of GA-CNN; the relative error between predicted data and real data is less than 0.5%.
Liu et al. [29]	SVM	The experiments showed that similar subsequence search and support vector regression models solved the pseudo periodic problem, and achieved higher accuracy.
Chuluunsaikhan et al. [35]	LSTM	They utilized the topic modeling technique (LDA) to select the top 20 features from 862 features using Pearson correlation to decrease the feature dimensionality in pork prices.
Singh et al. [37]	STL-SVR-Alma	The results indicate an overall improvement in forecasting of pork prices through the utilization of the STL-SVR-Alma model incorporating the factors of cyclical price fluctuations, seasonal variations, and irregular fluctuations.
Dabin et al. [38]	CEEMD and GA-SVR	The results highlight that combining the GA algorithm with a hybrid SVM model leads to higher prediction accuracy and durability compared to a single SVR model.

Table 2. Decision tree algorithm pros and cons.

Advantage

Disadvantage

Decision tree that is straightforward to grasp and interpret.
Working knowledge of quantitative and categorical data, alongside discrete and continuous values.
A white-box model with clearly interpretable outcomes. If the model shows the circumstance, Boolean logic can comprehend it.

Prone to overfitting and overly complicated, which makes it hard to generalize, but this can be fixed by setting a minimum number of samples for each node and limiting the depth of the decision tree.
Unstable and even minor changes in the sample can have an enormous effect. This can be handled by integrated learning.
Decision tree learning maximizes the local optimum, that is, the optimum at each node, and hence does not ensure the return of a globally optimal decision tree.

Table 3. Descriptive statistics of variables.

Variables	Name	Length	Dtype	Minimum	Maximum	Average	Variance
Year	X1	257	float64	2011	2015	2013	1.42
Month	X2	257	float64	1	12	6.53	3.44
Week	X3	257	float64	1	53	26.86	15.06
Average price of hogs at slaughter (CNY/kg)	X4	257	object	11.01	20.37	15.35	2.06
Average price of piglets (CNY/kg)	X5	257	object	14.87	31.61	21.18	3.79
Average sow price (CNY/head)	X6	257	object	1102.83	1515.21	1277.26	89.78
Corn (CNY/kg)	X7	257	object	2.27	2.67	2.53	0.10
Wheat bran (CNY/kg)	X8	257	object	1.84	2.29	2.11	0.13
Fattening pig compound (CNY/kg)	X9	257	object	3.01	3.59	3.43	0.15
Pork prices (CNY/kg)	Y	257	object	18.86	31.45	24.67	2.68

Table 4. Information on datasets at triple stage.

	Total Samples (Number)	Period (Year/Month/Week)
Total Data	257	2011.1.1~2015.12.53
Training Set	205	2011.1.1~2014.12.52
Validation Set	26	2014.12.53~2015.7.27
Test Set	26	2015.7.28~2015.12.53

Table 5. Experimental results of four models.

Test Set	MAE	MAPE	MSE	RMSE	R₂
CART	0.47351	0.017259	0.29788	0.54579	0.73623
SSA-CART	0.38697	0.014049	0.19358	0.43998	0.82859
SA-CART	0.42771	0.015481	0.2399	0.48979	0.78758
WSO-CART	0.42277	0.0153	0.237	0.48682	0.79015

Table 6. The results of hyperparameters among different models.

Algorithm	MinLeafSize
SSA-CART	5
SA-CART	9
WSO-CART	7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qin, J.; Yang, D.; Zhang, W. A Pork Price Prediction Model Based on a Combined Sparrow Search Algorithm and Classification and Regression Trees Model. Appl. Sci. 2023, 13, 12697. https://doi.org/10.3390/app132312697

AMA Style

Qin J, Yang D, Zhang W. A Pork Price Prediction Model Based on a Combined Sparrow Search Algorithm and Classification and Regression Trees Model. Applied Sciences. 2023; 13(23):12697. https://doi.org/10.3390/app132312697

Chicago/Turabian Style

Qin, Jing, Degang Yang, and Wenlong Zhang. 2023. "A Pork Price Prediction Model Based on a Combined Sparrow Search Algorithm and Classification and Regression Trees Model" Applied Sciences 13, no. 23: 12697. https://doi.org/10.3390/app132312697

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Pork Price Prediction Model Based on a Combined Sparrow Search Algorithm and Classification and Regression Trees Model

Abstract

1. Introduction

2. Related Work

2.1. The Multiple-Factor Method

2.2. The Combining Model

3. Methodology

3.1. Decision Tree Regression

3.2. Sparrow Search Algorithm

3.3. SSA-CART Model

4. Dataset Introduction

4.1. Data Sources

4.2. Data Processing

4.3. Pearson Correlation Coefficient

5. Experimental Details

5.1. Experimental Procedure

5.2. Comparative Experiment

5.3. Discussion of Result

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI