Next Article in Journal
A New Approach to Spatial Landslide Susceptibility Prediction in Karst Mining Areas Based on Explainable Artificial Intelligence
Previous Article in Journal
Speed Display Radars’ Impact on Speed Reduction on District Roads at Settlement Entrances
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Using Machine Learning Methods to Predict Consumer Confidence from Search Engine Data

1
Department of Computer Science, Shandong University of Finance and Economics, Jinan 250014, China
2
Department of Management Science and Engineering, Shandong University of Finance and Economics, Jinan 250014, China
3
Agricultural Bank of China Limited Shandong Branch, Jinan 250001, China
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(4), 3100; https://doi.org/10.3390/su15043100
Submission received: 17 November 2022 / Revised: 12 January 2023 / Accepted: 14 January 2023 / Published: 8 February 2023
(This article belongs to the Section Economic and Business Aspects of Sustainability)

Abstract

:
The consumer confidence index is a leading indicator of regional socioeconomic development. Forecasting research on it helps to grasp the future economic trends and consumption trends of regional development in advance. The data contained on the Internet in the era of big data can truly and timely reflect the current economic trends. This paper constructs a conceptual framework for the relationship between the consumer confidence index and web search keywords. It employed six machine learning and deep learning models: the BP neural network, the convolutional neural network, support vector regression, random forest, the ELMAN neural network, and the extreme learning machine to predict the consumer confidence index. The study shows that the use of machine learning models has a better prediction effect on the consumer confidence index. Compared with other models, the BP neural network and the convolutional neural network have lower error indicators and higher model accuracy, which helps decision-makers forecast the consumer confidence index. Consumers search for various goods and prices, as well as macroeconomics, to understand the economic conditions of the market, which affects the consumer confidence index and consumption decisions. Therefore, web search data can be used to predict consumer confidence. Future research can be extended to other macro indicator-related prediction studies. It is important to promote market consumption and confidence, improve consumption policies, and promote national prosperity.

1. Introduction

Over the past 40 years of reform and opening up, China’s consumer sector has undergone tremendous changes. The consumer market has continued to grow rapidly with consumption becoming the number one driver of China’s economic growth. The consumer confidence index (CCI) is a composite index of consumer satisfaction with current economic conditions and expectations of future economic trends in a country or region. It is a leading indicator for predicting future economic trends and consumer consumption tendencies. The CCI provides an important reference basis for national policy formulation, economic forecasting, and other applications. Therefore, a forecasting study of the consumer confidence index combined with the frontier methodology can help further interpret and understand the connotation and influencing factors. The interaction between the consumer confidence index and other macroeconomic time series variables in China could then be used to predict the macroeconomic trends and changes in China in depth. However, due to the time lag and low frequency of traditional statistical survey data, it is not conducive to accurate economic expectations and a timely grasp of consumers’ psychological state, which may lead to deviations in the implementation of corresponding policies. In order to compensate for the low frequency and poor timeliness of the monthly consumer confidence index, this paper proposes to use machine learning algorithms combined with web search data to forecast the consumer confidence index. The research could make accurate and effective predictions of consumer behavior and future consumer expectations.
Research on the consumer confidence index forecasting methods is still focused on traditional econometric methods. However, few studies have been conducted from the perspectives of machine learning and deep network learning models, or from the aspects of data space characteristics and time-varying information. Based on the existing studies, this paper applies various machine learning and deep network learning models to predict the consumer confidence index in China and analyzes its economic laws in depth in the context of the Chinese economy.
In order to improve the prediction effect, it is necessary to pay attention to the problem of data selection based on the selected model. In the context of big data technology, web search data has the characteristics of strong timeliness, convenience, and sensitivity to consumer behavior. It provides a new research perspective for the study of prediction problems by using it as a predictor variable. The introduction of web search data can effectively break through the shortcomings of traditional data, such as small volume and poor timeliness, which can help to improve the prediction performance and effect of forecasting models. Therefore, in response to the lack of existing research on the consumer confidence index, this paper is based on the BP neural network, support vector regression, the extreme learning machine, random forest regression, the ELMAN neural network, and the convolutional neural network, as models for comparative analysis. First, based on the existing research, we construct the keyword database of the web search data related to the consumer confidence index from macroeconomic factors. These macroeconomic factors include household financial situation, commodity supply, financial environment, and employment situation. We used Pearson cross-correlation analysis and stepwise regression method to obtain the final predictor variables. Secondly, we introduced the web search data into the BP neural network, support vector regression, the extreme learning machine, random forest, the ELMAN neural network, the convolutional neural network machine learning and deep learning models. We further optimized the parameter values in their models and explored the changing trend of the consumer confidence index. Thirdly, on the basis of several benchmark prediction models and evaluation indexes, we systematically compared and analyzed the prediction of multiple machine learning models.
Another contribution of this paper is to study the effectiveness of machine learning methods. It promotes the application of machine learning methods in macroeconomic forecasting and provides ideas for the intelligent and digital transformation of macroeconomic forecasting. As an up-to-date forecasting tool, the applications of machine learning in the economic field include helping scholars to obtain data that were difficult to obtain in the past. Machine learning tools could make full use of the value of big data, directly mine the relationship between the variables, and identify cause and effect.

2. Literature Review

2.1. Prediction Research Based on Deep Learning

Machine learning methods have improved socioeconomic, and there is a growing body of research on their application to macroeconomic forecasting. Research in the literature on forecasting models and machine learning forecasting models applied to the field of economic forecasting is still in the exploratory stage, but studies have shown that machine learning has significant advantages in the field of economic forecasting [1]. Two machine learning models, XGBoost and LightGBM, outperformed traditional GBDT models, among others, when studying the trend of monthly housing rent changes [2]. Li et al. [3] constructed a stock return prediction model in fundamental quantitative investment using techniques, such as recurrent neural networks and the LSTM model. Qing and Chenwei [4] used the model to forecast stock indices of different maturities, and their prediction results all showed more stable characteristics, and the LSTM model could effectively control the error fluctuations and have higher prediction accuracy compared with SVR, MLP, and ARIMA models.

2.2. Prediction Research Based on an Online Search Index

On the other hand, the introduction of web search data can effectively break through the shortcomings of traditional data, such as small capacity and poor timeliness [5], which helps to improve the prediction performance and effectiveness of the prediction model. Web search data were first applied to the prediction of influenza incidence when Eysenbach et al. [6] used the search volume of influenza-related keywords to synthesize a search index to study its relationship with influenza incidence. After that, web search data were applied to macroeconomic indicators. Askitas and Zimmermann et al. [7] and Smith [8] introduced unemployment-related keywords in Google Trends data into the unemployment rate prediction model and found through the study that there is a correlation between them, and the introduction of search index variables can improve the prediction accuracy. Zhang et al. [9] used web search keywords to construct the macro situation index and supply and demand index and found that changes in the attention of both types of search indices would cause changes in CPI, and the model constructed by web search data had good out-of-sample forecasting ability.
Web search data have been subsequently applied to financial market research. There is a correlation between the search index and annual stock returns from both theoretical and empirical perspectives [10]. Yang et al. [11] studied the correlation between unexpected events and stock market moving concept boards, and synthesized market and event search indices separately using web search data, and the results showed that event search indices can accurately capture the stock market volatility of the corresponding boards.
There has been a significant breakthrough in the field of research on the correlation between Internet searches and behavior. Xiong et al. [12] used a synthetic search index based on the number of people searching for influenza-related keywords on the Internet, which has a stable long-term relationship with the incidence of influenza, to predict the incidence of influenza and the mortality rate of influenza. Choi and Varian [13] conducted an in-depth study on the relationship between sales volume and web search data in four industries, including automotive and parts, retail, travel, and real estate, in the U.S. The study added keyword web search volume as a new influencing factor to the traditional autoregressive model and built a new sales volume prediction model based on the seasonal autoregressive model and fixed-effects model. The study by Zhang et al. [9] constructed a model to predict the national CPI from the obtained search data, and the model fitting effect was 0.978 with a one-month prediction lead time. Fang et al. [14] used the Baidu index to compile a search index, through which they constructed difficult-to-quantify indicators, such as investor sentiment, and studied its impact on exchange rate changes. For the real estate market, Tang et al. [5] used Baidu search data on second-hand houses in Beijing to forecast second-hand house prices in Beijing. In addition, web search data are also widely used for tourism indices, tourist volume forecasting, and hotel occupancy forecasting, as seen in the studies by Lijun [15] and Tang et al. [5].

2.3. Factors Influencing Consumer Confidence

Regarding the factors influencing consumer confidence, most of the research findings indicate that the consumer confidence index is mainly related to prices, income, employment, economic conditions, investment, real estate, and manufacturing. Yue Qiu [16] studied the relationship between changes in overall economic conditions and the CCI, and argued that the dynamic structural changes in the consumer confidence index are a reflection of changes in macroeconomic operating conditions. Shayaa et al. [17] found that changes in production prices bring about negative changes in the CCI, and the strength of the effect increases over time. SM Juhro et al. [18] studied the factors influencing the living conditions index; the results showed that prices are the most important factor influencing the living conditions index in the short term and consumers at all income levels are most concerned about price changes in the food and housing categories. The CPI is negatively related to consumer confidence, and among industry factors, changes in the composite indices of manufacturing and finance will have a positive impact on consumer confidence [19]. T Ekici et al. [20] examined the relationship between age, marital status, employment status, income, and gender and consumer confidence, and all factors except gender have significant effects on consumer confidence.
There has been a large amount of literature on the correlation between Internet searches and socioeconomic behavior. There is also a large literature on the use of machine learning methods to predict economic indicators. Additionally, researchers have conducted a lot of research about the factors that influence the consumer confidence index. However, there is no mature method for how to deal with keywords completely. Most of the studies on the synthesis of search indices are only a simple arithmetic average of all keyword data, and the weights that different keywords should occupy need to be further explored. There is also little literature on the in-depth analysis of consumer confidence indices from various aspects and fields of keyword search combined with the Chinese context.

3. Research Method

3.1. Research Model

Machine learning is a subfield of artificial intelligence. Machine learning refers to the general term for algorithms that identify patterns from data and use them to accomplish tasks such as prediction, classification, and clustering. In a practical sense, machine learning is a method that uses data to generate models and train them to perform tasks that typically include prediction, classification, or data generation. In operation, machine learning methods can be divided into supervised and unsupervised learning. Supervised learning refers to machine learning where the explanatory variables are known, while unsupervised learning refers to learning in which the explanatory variables are unknown.
The convolutional neural network is a class of feedforward neural networks that include convolutional computation and is one of the representative algorithms of deep learning [1,2], which can be applied to both supervised and unsupervised learning. Convolutional neural networks contain four different types of layers: input, output, convolutional, and pooling layers, and the overall structure is similar to that of fully connected layers, which are stacked by multiple layers. Compared with traditional neural networks, convolutional neural networks reduce the number of training parameters and computational effort for the same network size and are more adaptive, while reducing the complexity of the model and the risk of overfitting.
The convolutional and pooling layers can be arbitrarily combined and placed in the hidden layer to achieve the best model performance according to the specific task. The convolutional layers can share the parameters of the convolutional kernel, which greatly reduces the number of model parameters. The pooling layer can further extract useful features and reduce the model parameters again. Therefore, convolutional neural networks greatly streamline the model parameters compared to fully connected networks, making model training easier.
The CNN is essentially an input-to-output mapping, and the training process generally consists of two phases. Phase 1 is the forward propagation phase. The input convolutional network is fed and the actual output value corresponding to it is calculated. In this phase, information is transmitted from the input layer to the output layer through a step-by-step transformation. Phase 2 is the backward propagation phase. The convolution equation is as follows:
γ t = k = 1 n w k x t k + 1 + b
where γ t is the output feature data; wt is the convolution kernel; x t k + 1 is the input data; b is bias; and k is data length.
The convolutional neural network, as a special multilayer neural network, uses the same backpropagation algorithm as other neural networks in neural network training, but the difference lies in the network structure. The network connection of a convolutional neural network has the characteristics of local connection and parameter sharing. A local connection is relative to the full connection of a normal neural network, which means that a node in this layer is only partially connected to a node in the previous layer. Parameter sharing means that the connections of multiple nodes in a layer share the same set of parameters. The core of a convolutional neural network is composed of a convolutional layer, a pooling layer, and a full connection layer. The three basic structures are combined to form a multilayer network structure. CNNs replace the matrix dot product between layers of traditional networks with convolutional operations in feedforward and backpropagation. Convolutional neural networks use a weight-sharing strategy that can significantly reduce the number of parameters that must be learned.

3.2. Variable and Data Source

The data for this study were obtained from CNKI for the period January 2014–July 2022. The predicted variable in this paper is China’s consumer confidence index. Since the consumer confidence index is monthly data, data collection frequency is low. It will cause training data to be insufficient and it is difficult to portray the dynamic process of consumer confidence change. Considering that there is no large or sharp fluctuation in the change of the CCI in China in recent years, the monthly data is processed into weekly data by interpolation. It presents the dynamic process of the Chinese consumer confidence index, marked as yt, Figure 1 shows the dynamics of the Consumer Confidence Index in China. The independent variables are Internet search keywords related to the consumer confidence index, data are from the Baidu search index, and the downloaded keyword search volume is presented in weeks. The data processing adopted in this study is as follows. After the basic operation of reducing the effect of noise, feature extraction is performed. The Pearson correlation coefficient is first used to calculate the correlation coefficient and then the stepwise regression method above econometrics is used to obtain the key features that can be used to predict the consumer confidence index.

3.3. Model Prediction Independent Variable Determination

Consumers often grasp the macro and micro economic status, price trend, commodity supply, and demand status through an online search. Online search behaviors will bring different consumer information to consumers, which will influence consumer decisions and consumer expectations. Therefore, we can use web search data to predict the relevant characteristics of consumer confidence. There are numerous web search keywords related to the consumer confidence index. In order to construct an effective set of predictive variables, we filter and screen past literature to identify keywords with good predictive power. In this paper, we will identify and select web search keywords through the following three steps:

3.3.1. Construction of Keyword Database

Based on the past academic literature and Baidu’s recommended terms, this paper selected 64 benchmark keywords related to the consumer confidence index from five aspects. They include macroeconomic factors, household financial situation, commodity supply, financial environment, and employment situation. These five aspects form a keyword database according to the principle of the consumer confidence index synthesis, as shown in Table 1 below.

3.3.2. Screening of Independent Variables

Since it is considered that some of the initial web search keyword data have large noise, thus affecting the predictive effect of the model, this paper first screens out keywords that may have predictive power with the help of econometric methods. This study uses utilizes Stata statistical software and uses Pearson cross-correlation analysis to identify the correlation effects among time series data variables. The correlation coefficient can reflect the correlation between the search volume of each keyword and the CCI. The larger the absolute value of the coefficient, the closer the linear relationship between the keyword and the CCI. Excluding the influence of noisy data, 18 keywords with potential predictive power were obtained, as shown in Table 2 below.

3.3.3. Final Model Independent Variable Determination

In the prediction model species, the marginal contribution of certain potential predictor variables is different from the model prediction. In order to screen out suitable predictor variables, a stepwise regression analysis was used to determine the final predictor variables, and the significance level was set at 1%. Finally, this paper obtains eight keywords with better predictive performance as predictor variables. They include house price, peanut oil price, air conditioner price, tourism, automobile, population, inflation, and oil price. The analysis results are shown in Table 3.

3.4. Model Input Set Construction

As shown in Table 3, consumers search for the house price, peanut oil price, air conditioner price, travel, car, and oil price through the Internet. This behavior to a certain extent reflects consumers’ concern about monetary purchasing power. On the other hand, consumers obtain different consumption information online, which affects consumers’ consumption decisions and consumption expectations. Additionally, consumers search for two macroeconomic factors, population and inflation, reflecting consumers’ concern about macroeconomic indicators, which has an impact on the consumer confidence index.
In summary, this study selected house price, peanut oil price, air conditioner price, tourism, automobile, population, inflation, and oil price as model input variables. It selected the consumer confidence index as the model output variable with a sample size of 448. In order to eliminate the interference of variables with different magnitudes and outliers on the prediction performance of the model, the data set was transformed by normalization using Sigmoil, and the data normalization is to some extent beneficial to the improvement of the prediction performance of the model [44]. According to the requirements of the prediction model, the normalized sample data are divided into the training set and the test set, and the prediction results are then processed by inverse normalization to obtain the prediction values after the model is predicted.

4. Results

Various machine learning and deep learning models for the consumer confidence index prediction are based on data from web search indices. Table 4 shows the true values of the consumer confidence index and the predicted values of the six machine learning models used in this study at different dates. In the short, the BP neural network and the convolutional neural network are the closest to the true values in predicting the consumer confidence index, and the models are more robust. Additionally, in the medium term, the CNN is the closest to the true values. In the long term, the BP neural network, the convolutional neural network, random forest regression, and support vector regression models have prediction performance.
Table 5 shows the errors of various machine learning and deep learning models for different periods. Both in the short term and long term, the BP neural network and the convolutional neural network have the least error over the prediction of the consumer confidence index, representing more accurate and robust models.
To compare the prediction performance of each machine learning and deep learning model, this study combines two model prediction performance metrics (the mean absolute error, MAE, and the root-mean-square error, RMSE, respectively) to evaluate the prediction effectiveness of each model. The mean absolute error (MAE) is the average of the absolute error, which can better reflect the actual situation of the error of the predicted value. The root-mean-square error (RMSE) is the root-of-mean-square error, open sign, and the root-mean-square offset represents the sample standard deviation of the difference between the predicted and observed values. Both MAE and RMSE are important indicators of the deviation of the predicted value from the true value. The smaller the two indicators of the model, the more accurate the prediction of the model. Table 6 shows the results of the error performance metrics of various models.
In terms of prediction accuracy, the MAE and RMSE of the BP neural network prediction model are 0.339169 and 0.193975, respectively. The MAE and RMSE of the convolutional neural network model are 0.363376 and 0.26007, respectively, which are smaller than those of the random forest model, vector support regression, the ELMAN neural network, and the extreme learning machine model. This proves that both the BP neural network prediction model and the convolutional neural network prediction model have higher prediction accuracy in predicting the consumer confidence index using search terms.
From the above analysis, it can be concluded that after the construction of the keyword database, the screening of indicators using correlation coefficients, and finally the construction of various machine learning and deep learning models, the prediction of the consumer confidence index based on search keyword data has a better prediction. Among them, the BP neural network and the convolutional neural network outperformed other machine learning models in terms of accuracy. The Chinese consumers understand the domestic market environment through goods with daily characteristics such as house price, peanut oil price, air conditioner price, travel, and automobile and their prices. Influencing consumer market confidence could affect the final decision on consumption. By searching and understanding the domestic macroeconomic and market environment through macro factors, such as population, inflation, oil prices, etc., consumers make judgments about the market economic situation and thus influence the consumer confidence index.

5. Conclusions

In this paper, we introduced various machine learning and deep learning methods for the non-linear variation characteristics of the consumer confidence index. Combining five aspects of consumer confidence including macroeconomic environment, commodity supply, household financial situation, employment situation, and financial environment, we built a keyword phrase database of web search data. Then, we constructed various machine learning and deep learning prediction models to conduct prediction research on the consumer confidence index. The following conclusions and insights were obtained.
First, the consumer confidence index is a key indicator of consumer confidence, which can provide a relatively comprehensive picture of consumers’ comprehensive judgment of the current and future economic situation and development. The results of this study show that consumers could use Internet searches to understand the macro and microeconomic status, commodity supply and demand, and price trends. The online information could affect their own consumption decisions and expectations. Therefore, we can use web search data to predict the relevant characteristics of consumer confidence.
Secondly, the BP neural network and the convolutional neural network models constructed in this paper can grasp the dynamics of consumer confidence in a more comprehensive and timely manner. These two models are more effective in predicting the confidence index than the traditional machine learning models of support vector regression, fandom forest, and the ELMAN neural network. Their error indicators, MAE and RMSE, are generally smaller than those of other models, which can help the Chinese government to make predictions on the consumer confidence index. It also could provide an effective reference for relevant departments to formulate and implement policy measures.
Again, in order to promote consumption upgrading and achieve high-quality development of Chinese industries, the government needs not only to accelerate the supply-side structural reform, but also to monitor the trend of consumer confidence in the economy and the market promptly. It also needs to understand and channel the factors that reduce consumer confidence. This paper is based on big data network search technology, and the BP neural network and the convolutional neural network models are constructed by machine learning, which incorporate several macroeconomic indicators. The BP neural network and the convolutional neural network models constructed by machine learning are incorporated into the model to enhance the prediction of consumer confidence trends and other macroeconomic indicators.
Finally, the BP neural network and the convolutional neural network models constructed in this paper can provide practical experience for the prediction of other macroeconomic indicators. Based on the same principles and forecasting methods, the model can also be extended to other macro-indicator-related forecasting studies. It is also a research direction worth exploring in the future.
Machine learning algorithms can help to better predict the Chinese consumer confidence index, and there are many ways to improve the consumer confidence index. In the future, it is important to promote market consumption and confidence based on the prediction of the consumer confidence index based on machine learning models to improve consumption policies and promote national prosperity.

Author Contributions

Conceptualization, H.H.; methodology, H.H.; software, Z.L. (Zhiming Li); validation, Z.L. (Zongwei Li); formal analysis, Z.L. (Zongwei Li); investigation, Z.L. (Zhiming Li); resources, Z.L. (Zongwei Li); data curation, Z.L. (Zongwei Li); writing—original draft preparation, Z.L. (Zhiming Li); writing—review and editing, Z.L. (Zhiming Li); visualization, Z.L. (Zongwei Li); supervision, H.H.; project administration, H.H.; funding acquisition, H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Chinese Consumer Confidence Index was obtained from CNKI database. Baidu Engine Searching data could be found at: index.baidu.com.

Acknowledgments

The authors gratefully acknowledge the institutional support provided by the Shandong University of Finance and Economics.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Kleinberg, J.; Lakkaraju, H.; Leskovec, J.; Ludwig, J.; Mullainathan, S. Human Decisions and Machine Predictions. Soc. Sci. Electron. Publ. 2018, 133, 237–293. [Google Scholar]
  2. Ming, Y.; Zhang, J.; Qi, J.; Liao, T.; Wang, M.; Zhang, L. Prediction and Analysis of Chengdu Housing Rent Based on Xgboost Algorithm. In Proceedings of the 2020 3rd International Conference on Big Data Technologies, Qingdao, China, 18–20 September 2020; pp. 1–5. [Google Scholar]
  3. Li, B.; Shao, X.; Li, Y. Research on Machine Learning Driven Quantamental Investing. China Ind. Econ. 2019, 8, 61–79. [Google Scholar]
  4. Qing, Y.; Chenwei, W. A Study on Forecast of Global Stock Indices Based on Deep LSTM Neural Network. Stat. Res. 2019, 36, 65–77. [Google Scholar]
  5. Tang, X.B.; Zhang, R.; Liu, L.X. Research on Forecast of Second-Hand House Price in Beijing Based on SVR Model of Bat Algorithm. Stat. Res. 2018, 35, 71–81. [Google Scholar]
  6. Eysenbach, G. Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance. In Proceedings of the AMIA Annual Symposium Proceedings, Washington, DC, USA, 11–15 November 2006; Volume 2006, p. 244. [Google Scholar]
  7. Askitas, N.; Zimmermann, K.F. Google Econometrics and Unemployment Forecasting. Appl. Econ. Q. 2009, 55, 107–120. [Google Scholar] [CrossRef]
  8. Smith, P. Google’s MIDAS Touch: Predicting UK Unemployment with Internet Search Data. J. Forecast. 2016, 35, 263–284. [Google Scholar] [CrossRef]
  9. Zhang, C.; Lv, B.; Peng, G.; Liu, Y.; Yuan, Q. A Study on Correlation between Web Search Data and CPI. In Recent Progress in Data Engineering and Internet Technology; Springer: Berlin/Heidelberg, Germany, 2012; pp. 269–274. [Google Scholar]
  10. Ying, L.; Benfu, L.; Geng, P. Predictive Power of Internet Search Data for Stock Market: A Theoretical Analysis and Empirical Test. Econ. Manag. 2011, 33, 172–180. [Google Scholar]
  11. Yang, X.; Ben-Fu, L.V.; Peng, G.; Liu, Y.; Management, S.O. A Study of Emergency on Stock Market Based on Web Search Data Evidence from Yongwen Railway Accident. Math. Pract. Theory 2013, 43, 17–28. [Google Scholar]
  12. Xiong, R.; Nichols, E.P.; Shen, Y. Deep Learning Stock Volatility with Google Domestic Trends. arXiv 2015, arXiv:1512.0491. [Google Scholar]
  13. Choi, H.; Varian, H. Predicting the Present with Google Trends. Econ. Rec. 2012, 88, 2–9. [Google Scholar] [CrossRef]
  14. Fang, J.; Gozgor, G.; Lau, C.-K.M.; Lu, Z. The Impact of Baidu Index Sentiment on the Volatility of China’s Stock Markets. Financ. Res. Lett. 2020, 32, 101099. [Google Scholar] [CrossRef]
  15. Lijun, L.U. On the Prediction of Tourist Volume Based on Network Search Index and EMD-ARIMA-BP Combination Model: A Case Study of Zhangjiajie. J. Jishou Univ. Social Sci. Ed. 2019, 40, 138. [Google Scholar]
  16. Qiu, Y. Forecasting the Consumer Confidence Index with Tree-Based MIDAS Regressions. Econ. Model. 2020, 91, 247–256. [Google Scholar] [CrossRef]
  17. Shayaa, S.; Ainin, S.; Jaafar, N.I.; Zakaria, S.B.; Phoong, S.W.; Yeong, W.C.; Al-Garadi, M.A.; Muhammad, A.; Zahid Piprani, A. Linking Consumer Confidence Index and Social Media Sentiment Analysis. Cogent Bus. Manag. 2018, 5, 1509424. [Google Scholar] [CrossRef]
  18. Juhro, S.M.; Iyke, B.N. Consumer Confidence and Consumption Expenditure in Indonesia. Econ. Model. 2020, 89, 367–377. [Google Scholar] [CrossRef]
  19. Perić, B.Š.; Sorić, P. A Note on the “Economic Policy Uncertainty Index. ” Soc. Indic. Res. 2018, 137, 505–526. [Google Scholar] [CrossRef]
  20. Ekici, T. Subjective Financial Distress in the Formation of Consumer Confidence: Evidence from Novel Household Data. Boğaziçi J. 2016, 30, 11. [Google Scholar] [CrossRef]
  21. Tiep, N.C.; Wang, M.; Mohsin, M.; Kamran, H.W.; Yazdi, F.A. An Assessment of Power Sector Reforms and Utility Performance to Strengthen Consumer Self-Confidence towards Private Investment. Econ. Anal. Policy 2021, 69, 676–689. [Google Scholar] [CrossRef]
  22. Guo, Y.; He, S. Does Confidence Matter for Economic Growth? An Analysis from the Perspective of Policy Effectiveness. Int. Rev. Econ. Financ. 2020, 69, 1–19. [Google Scholar] [CrossRef]
  23. Svensson, H.M.; Albæk, E.; Van Dalen, A.; De Vreese, C.H. The Impact of Ambiguous Economic News on Uncertainty and Consumer Confidence. Eur. J. Commun. 2017, 32, 85–99. [Google Scholar] [CrossRef]
  24. Cogoljević, D.; Gavrilović, M.; Roganović, M.; Matić, I.; Piljan, I. Analyzing of Consumer Price Index Influence on Inflation by Multiple Linear Regression. Phys. A Stat. Mech. Its Appl. 2018, 505, 941–944. [Google Scholar] [CrossRef]
  25. Farmer, R.E.A. The Importance of Beliefs in Shaping Macroeconomic Outcomes. Oxford Rev. Econ. Policy 2020, 36, 675–711. [Google Scholar] [CrossRef]
  26. Winter-Nelson, A. International Food Safety Regulations in the United States and the European Union-Balancing Consumer Confidence and Trade: Discussion. Am. J. Agric. Econ. 2009, 91, 1491–1492. [Google Scholar] [CrossRef]
  27. Kasztelan, A. How Circular Are the European Economies? A Taxonomic Analysis Based on the INEC (Index of National Economies’ Circularity). Sustainability 2020, 12, 7613. [Google Scholar] [CrossRef]
  28. Mukherjee, A.; Sinha, U.B. Export Cartel and Consumer Welfare. Rev. Int. Econ. 2019, 27, 91–105. [Google Scholar] [CrossRef]
  29. McIntyre, K.H. Reconciling Consumer Confidence and Permanent Income Consumption. East. Econ. J. 2007, 33, 257–275. [Google Scholar] [CrossRef]
  30. Bianchi, C.; Reyes, V.; Devenin, V. Consumer Motivations to Purchase from Benefit Corporations (B Corps). Corp. Soc. Responsib. Environ. Manag. 2020, 27, 1445–1453. [Google Scholar] [CrossRef]
  31. Ghafoori, E.; Mata, F.; Borg, K.; Smith, L.; Ralston, D. Retirement Confidence: Development of an Index. Inq. J. Health Care Organ. Provision Financ. 2021, 58, 00469580211035732. [Google Scholar] [CrossRef]
  32. Kilic, E.; Cankaya, S. Consumer Confidence and Economic Activity: A Factor Augmented VAR Approach. Appl. Econ. 2016, 48, 3062–3080. [Google Scholar] [CrossRef]
  33. Gholipour Fereidouni, H.; Tajaddini, R. Housing Wealth, Financial Wealth and Consumption Expenditure: The Role of Consumer Confidence. J. Real Estate Financ. Econ. 2017, 54, 216–236. [Google Scholar] [CrossRef]
  34. Gupta, R.K. Effects of Confidence and Social Benefits on Consumers’ Extra-Role and in-Role Behaviors: A Social Identity and Social Exchange Perspective. J. Retail. Consum. Serv. 2022, 65, 102879. [Google Scholar]
  35. Singh, J.; Dall’Olmo Riley, F. Consumer Perceptions and Behaviour towards Branded Commodities. J. Consum. Behav. 2022, 21, 19–32. [Google Scholar] [CrossRef]
  36. Li, S.-W.; Chen, Z.H.U.; Chen, Q.-H.; Liu, Y.-M. Consumer Confidence and Consumers’ Preferences for Infant Formulas in China. J. Integr. Agric. 2019, 18, 1793–1803. [Google Scholar] [CrossRef]
  37. Bildirici, M.E.; Badur, M.M. The Effects of Oil and Gasoline Prices on Confidence and Stock Return of the Energy Companies for Turkey and the US. Energy 2019, 173, 1234–1241. [Google Scholar] [CrossRef]
  38. Binder, C. Interest Rate Prominence in Consumer Decision-making. Econ. Inq. 2018, 56, 875–894. [Google Scholar] [CrossRef]
  39. Reed, M. A Study of Social Network Effects on the Stock Market. J. Behav. Financ. 2016, 17, 342–351. [Google Scholar] [CrossRef]
  40. Luca, D.; Schmeiser, H.; Schreiber, F. Investment Guarantees in Financial Products: An Analysis of Consumer Preferences. Geneva Pap. Risk Insur. Pract. 2022, 1–35. [Google Scholar] [CrossRef]
  41. Ferrer, E.; Salaber, J.; Zalewska, A. Consumer Confidence Indices and Stock Markets’ Meltdowns. Eur. J. Financ. 2016, 22, 195–220. [Google Scholar] [CrossRef]
  42. Demir, E.; Alıcı, Z.A.; Chi Keung Lau, M. Macro Explanatory Factors of Turkish Tourism Companies’ Stock Returns. Asia Pacific J. Tour. Res. 2017, 22, 370–380. [Google Scholar] [CrossRef]
  43. Caleiro, A.B. Learning to Classify the Consumer Confidence Indicator (in Portugal). Economies 2021, 9, 125. [Google Scholar] [CrossRef]
  44. Singh, P.; Suri, B. Quality Assessment of Data Using Statistical and Machine Learning Methods. In Computational Intelligence in Data Mining-Volume 2; Springer: New Delhi, India, 2015; pp. 89–97. [Google Scholar]
Figure 1. Consumer confidence index for 448 periods.
Figure 1. Consumer confidence index for 448 periods.
Sustainability 15 03100 g001
Table 1. Benchmark keywords related to the consumer confidence index.
Table 1. Benchmark keywords related to the consumer confidence index.
Macroeconomic factorsTaxes, Prices, Investment [21], Economic Growth [22], Economic Situation [23], Money Supply, Monetary Policy, Inflation [24], Macroeconomic Control, Employment [25], International Trade [26], International Settlement, National Economy [27], Expenditure [18], Industrial Value Added, Import and Export [28], Economy [23], US Dollar
Family financial situationPopulation, Income [29], Quality of Life [30], Retirement [31], Family Economic Status [32], Commercial Housing [33], Utility Bills, Rent, Social Security [34], Medical Care, Savings, Deposit Rates, Housing Prices
Commodity supply [35]Car [36], Air Conditioner Price, Food Price, Washing Machine Price, Pork Price, Decoration Material Price, Refrigerator Prices, Peanut Oil Prices, Cosmetics, Oil Prices [37], Vegetable Prices, Gasoline Prices [37], Travel, Tourist Attractions, Smart TV Prices, Liquid Petroleum Gas Prices [37], Culture
Financial environmentInterest Rate Cut [38], Financial Markets [39], Financial Investment [40], Financing, Central Bank, Interest Rate [38], Stock market [41], RMB Exchange Rate [42], Bank of China, Foreign Exchange Rate Rrid, Stocks
Employment situationUnemployment Rate [43], Employment Situation [25], Employment Rate [25], Age
Table 2. Correlation analysis of independent and dependent variables.
Table 2. Correlation analysis of independent and dependent variables.
VariablesCorrelation CoefficientVariablesCorrelation Coefficient
House price0.3500Car−0.4535
National economy0.4093Gasoline price0.3345
Macro-control−0.4440Population0.3537
Peanut oil price−0.3061Unemployment rate0.3087
Cosmetics−0.3820Inflation0.3520
House price−0.3500Washing machine price−0.3246
Air conditioner price−0.3817Oil price0.4585
Tourism−0.3952Smart TV price−0.3441
Tourist attractions−0.3159Age0.4772
Table 3. Correlation analysis of independent and dependent variables.
Table 3. Correlation analysis of independent and dependent variables.
CCICoefficientStd. Errtp > |t|
House price−0.00031030.0000488−6.360.000
Peanut oil price−0.00356220.0012912−2.760.006
Air conditioner price0.00156010.0002707−5.760.000
Tourism0.0001770.00004923.600.000
Automobile−0.00004030.0000121−3.340.001
Population0.00119560.00039573.020.003
Inflation0.00038210.00014152.700.007
Oil price0.00004230.00005597.570.000
Cons105.36760.5075212207.610.000
Table 4. Comparison of true values and predicted values from different models. The second column is the true value and the next six columns are the predicted values from each model.
Table 4. Comparison of true values and predicted values from different models. The second column is the true value and the next six columns are the predicted values from each model.
DATETRUEBPNNRFRSVRELMANELMCNN
August 2021108.309108.394165107.8359107.2346738106.218712106.74613107.572832
September 2021108.9626108.605077108.2389107.7928709106.6612107106.94792108.130195
October 2021109.4258109.454897109.2639108.7083261106.9071076107.92557109.395235
November 2021109.7463109.03774109.3563108.3975876106.8571451107.96292109.009328
December 2021109.815109.942503109.4511108.545021106.6652814107.60155109.68998
January 2022109.6322108.729639108.1444107.6673171106.8873609107.64436108.533012
February 2022109.2735109.065115108.8455108.9324908110.2593407108.99475109.12965
Match 2022108.8485108.800697108.8108.7563591108.4962227109.13962108.8893175
April 2022108.3104108.716093108.2861108.589573107.5439924108.25041108.450708
May 2022107.7686107.58529107.5692107.6994682108.3479212108.35107107.6327675
June 2022107.3337107.616705107.2682107.5320267108.0668133108.38024107.497395
July 2022107.1107.831361107.3091107.5510443109.1909602109.7658107.27576
Table 5. Error values predicted by different methods. The error value is the difference between the predicted value and the true value.
Table 5. Error values predicted by different methods. The error value is the difference between the predicted value and the true value.
ERROR BPERROR RFRERROR SVRERROR ELMANERROR ELMERROR CNN
0.085212163−0.4731−1.07428−2.09024−1.56282−0.73612
−0.35756486−0.72379−1.16977−2.30143−2.01472−0.83245
0.029133199−0.16185−0.71744−2.51866−1.50019−0.03053
−0.70855535−0.39003−1.34871−2.88915−1.78338−0.73697
0.127486923−0.3639−1.27−3.14974−2.21347−0.12504
−0.90252581−1.48776−1.96485−2.7448−1.98781−1.09915
−0.20841109−0.42806−0.341040.985814−0.27877−0.14388
−0.04779769−0.04852−0.09214−0.352270.2911210.040823
0.405707295−0.024320.279187−0.76639−0.059980.140322
−0.18330337−0.19939−0.069130.5793280.582477−0.13583
0.282965945−0.065510.1982870.7330741.0465020.163656
0.7313608810.2090560.4510442.090962.6658040.17576
Table 6. Model accuracy comparison.
Table 6. Model accuracy comparison.
BPRFRSVRELMANELMCNN
MAE0.3391690.3812730.7479881.7668221.3322540.363376
RMSE0.1939750.2954830.898334.0616422.4575310.26007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, H.; Li, Z.; Li, Z. Using Machine Learning Methods to Predict Consumer Confidence from Search Engine Data. Sustainability 2023, 15, 3100. https://doi.org/10.3390/su15043100

AMA Style

Han H, Li Z, Li Z. Using Machine Learning Methods to Predict Consumer Confidence from Search Engine Data. Sustainability. 2023; 15(4):3100. https://doi.org/10.3390/su15043100

Chicago/Turabian Style

Han, Huijian, Zhiming Li, and Zongwei Li. 2023. "Using Machine Learning Methods to Predict Consumer Confidence from Search Engine Data" Sustainability 15, no. 4: 3100. https://doi.org/10.3390/su15043100

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop