Next Article in Journal
Kolmogorov Basic Graphs and Their Application in Network Complexity Analysis
Next Article in Special Issue
Combining Measures of Signal Complexity and Machine Learning for Time Series Analyis: A Review
Previous Article in Journal
Quantum–Classical Correspondence Principle for Heat Distribution in Quantum Brownian Motion
Previous Article in Special Issue
A Review of the System-Intrinsic Nonequilibrium Thermodynamics in Extended Space (MNEQT) with Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

A Multi-Method Survey on the Use of Sentiment Analysis in Multivariate Financial Time Series Forecasting

by
Charalampos M. Liapis
*,
Aikaterini Karanikola
* and
Sotiris Kotsiantis
Department of Mathematics, University of Patras, 26504 Patras, Greece
*
Authors to whom correspondence should be addressed.
Entropy 2021, 23(12), 1603; https://doi.org/10.3390/e23121603
Submission received: 3 November 2021 / Revised: 25 November 2021 / Accepted: 26 November 2021 / Published: 29 November 2021
(This article belongs to the Special Issue Review Papers for Entropy)

Abstract

:
In practice, time series forecasting involves the creation of models that generalize data from past values and produce future predictions. Moreover, regarding financial time series forecasting, it can be assumed that the procedure involves phenomena partly shaped by the social environment. Thus, the present work is concerned with the study of the use of sentiment analysis methods in data extracted from social networks and their utilization in multivariate prediction architectures that involve financial data. Through an extensive experimental process, 22 different input setups using such extracted information were tested, over a total of 16 different datasets, under the schemes of 27 different algorithms. The comparisons were structured under two case studies. The first concerns possible improvements in the performance of the forecasts in light of the use of sentiment analysis systems in time series forecasting. The second, having as a framework all the possible versions of the above configuration, concerns the selection of the methods that perform best. The results, as presented by various illustrations, indicate, on the one hand, the conditional improvement of predictability after the use of specific sentiment setups in long-term forecasts and, on the other, a universal predominance of long short-term memory architectures.

1. Introduction

The observation of the evolution of various time-dependent phenomena, as well as the decision-making based on structures predicting their future behavior have greatly shaped the course of human history. The emergence of the need of the human species for knowledge of the possible future outcomes of various events could only lead to the development and use of methods aimed at extracting reliable predictions.Their success, however, is not necessarily inferred from the emergence of need.The research field of predicting sequential and time-dependent phenomena is called time series forecasting.
Specifically, time series forecasting is the process in which the future values of a variable describing features of a phenomenon are predicted based on existing historical data using a specific fit abstraction, i.e., a model. All such time-dependent features containing past observations are represented as time series. The latter then constitute the input of each forecasting procedure. Time series are sequences of time-dependent observations extracted at specific time points used as their indexes. The sampling rate varies according to the requirements and the nature of the problem. In addition, depending on the number of attributes, i.e., the dependent variables describing observations recorded sequentially over the predefined time steps, whose values are collected at any given time, a distinction is made between univariate and multivariate time series [1]. Such methods find application in a wide range of time-evolving problems. Some examples include rainfall forecasts [2], gold [3] or stock price market predictions [4], as well as forecasting the evolution of epidemics such as the current COVID-19 pandemic [5,6]. The domain has flourished in recent decades, as the demand for better and better models remains increasingly urgent, as their use can greatly contribute to the optimization of decision-making and thus lead to better results in various areas of human interest.
In terms of forecasting procedures, during the first decades of development, methods derived from statistics dominated the field. This was based on the reasonable assumption that, given the nature of the problem, knowing the statistical characteristics of time series is the key to understanding their structure, and therefore predicting their future behavior. Currently, these methods—although still widely used—have been largely surpassed in performance by methods derived from the field of machine learning. Numerous such predictive schemes are based on regression models [7,8], while recently, deep-machine-learning architectures such as long short-term memory (LSTM) [9,10] are gaining ground. In addition, advances in natural language processing in conjunction with the fact that many time-dependent phenomena are influenced by public opinion lead to the hypothesis that the use of linguistic modeling containing information related to the phenomenon in question could improve the performance of forecasting procedures. Data containing relevant information is now easy to retrieve due to the rapid growth of the World Wide Web initially and social networks in recent years, and it is therefore reasonable to examine the utilization of such textual content in predictive schemes.
This work is a continuation of a previous comparative study of statistical methods for univariate time series forecasting [11], which now focuses on methods belonging to the category of machine learning. Comparisons involve results from an extended experimental procedure regarding mainly a wide range of multivariate-time-series-forecasting setups, which include sentiment scores, tested in the field of financial time series forecasting. Below, the presentation of the results is grouped as follows: Two distinct case studies were investigated, the first of which concerns the use of sentiment analysis in time series forecasting, while the second contains the comparison of different time-series-prediction methods, all of which were fit in datasets containing sentiment score representations. In each of these two scenarios, the evaluation of the results was performed by calculating six different metrics. Three forecast scenarios were implemented: single-day, seven-day, and fourteen-day forecasts, for each of which the results are presented separately.

2. Related Work

The field of time series forecasting constitutes—as already mentioned—a very active area of research. Growing demand for accurate forecasts has been consistently established over the last few decades for many real-world tasks. Various organizations, from companies and cooperatives to governments, frequently rely on the outcomes of forecasting models for their decisions to reduce risk and improvement. A constant pursuit of increasing predictive accuracy and robustness has led the scientific community in several different research directions. In this context, and provided there is a strong correlation between the views of individuals and the course of specific sequential and time-dependent phenomena, it is both reasonable and expected to approach such problems by intersecting the field of forecasting with that of opinion mining [12,13]. Thus, there are several approaches that focus on trying to integrate information extracted using sentiment analysis techniques in predictive scenarios. This section tracks the relevant literature, focusing on works that investigate the aforementioned approach.
Time-series-forecasting problems can be reduced to two broad categories. The first one consists of tasks in which the general future behavior of a time series must be predicted. Such problems can be considered classification problems. On the other hand, when the forecast outputs the specific future values that a time series is expected to take, then the whole process can be reframed as a regression task. Regarding the first class of problems, the relevant literature contains a number of quite interesting works. In [14], a novel method that estimates social attention to stocks by sentiment analysis and influence modeling was proposed to predict the movement of the financial market when the latter is formalized as a classification problem. Five well-known classifiers in Chinese stock data were used to test the efficiency of the method. For the same purpose, a traditional ARIMA model was used, together with information derived from the analysis of Twitter data [15], strongly suggesting that the exploitation of public opinion enhances the possibility of correctly predicting the rise or fall of stock markets. Similar results were achieved in [16], where the application of text-mining technology to quantify the unstructured data containing social media views on stock-related news into sentiment scores increased the performance of the logistic regression algorithm. A more sophisticated approach that employs deep sentiment analysis was used to improve the performance of an SVM-based method in [17], indicating once again that sentiment features have a beneficial effect on the prediction.
Predicting the actual future values of a time series, on the other hand, is a task far more difficult than predicting merely the direction of a time series. Therefore, there are a significant number of studies directed towards this research area as well. In [18], different text preprocessing strategies for correlating the sentiment scores from Twitter scraped textual data with Bitcoin prices during the COVID-19 pandemic were compared, to identify the optimum preprocessing strategy that would prompt machine learning prediction models to achieve better accuracy. Twitter data were also used in [19] to predict the future value of the SSECI (Shanghai Stock Exchange Composite Index) by applying a NARX time series model combined with a weighted sentiment representation extracted from tweets. In [20], given that the experimental procedure involved both data related only to a certain stock, as well as a small number of compared algorithms, sentiment analysis of RSS news feeds combined with the information of SENSEX points was used to improve the accuracy of stock market prediction, indicating that the use of the sentiment polarity improves the prediction.
As recent research work has indicated, given that there is a series of applications where deep-learning methods tend to perform better than either the traditional statistical [21] and the machine-learning-based ones [22], it is expected that such methods would also be used along with sentiment analysis techniques to achieve even greater accuracy in forecasting tasks. In [23], an improved LSTM model with an attention mechanism was used on AAPL (NASDAQ ticker symbol for Apple Inc) stock data, after adopting empirical modal decomposition (EMD) on complex sequences of stock price data, utilizing investors’ sentiment to forecast stocks, while in [24], the experimental procedure over six different datasets indicated that the fusion of network public opinion and realistic transaction data can significantly improve the performance of LSTMs. Both works demonstrated that the use of sentiment modeling improves the performance of LSTMs, but the amount of data used does not seem to be sufficient to substantiate a clear and general conclusion.
In addition, in several works [25,26] ensemble-based techniques have also been utilized together with sentiment analysis for time series forecasting in order to exploit the benefits of ensemble theory. In [27], an ensemble method, formed by combining LSTMs and ARIMA models under a feedforward neural network scheme, was proposed in order to predict future values of stock prices, utilizing sentiment analysis on data provided by scraping news related to the stock from the Internet. Moreover, an ensemble scheme that combines two well-known machine-learning algorithms, namely support vector machine (SVM) and random forest, utilizing information related to the public’s opinion about certain companies by incorporating sentiment analysis by the use of a trained word2vec model was proposed in [28]. Despite the results taken from the experimental procedure indicating that there were cases in which the ensemble model performed better than its constituents, the overall performance of the model depended on both the volume and the nature of the data available.
In terms of extended studies that focus on the extensive comparison of several different methods, given that multiple sentiment analysis schemes are also incorporated to predict the future values of time series, to our knowledge, only a relatively more limited number of works seem to exist in the current literature. Some of them are listed below. Various traditional ML algorithms, as well as LSTM architectures were tested over financial data by exploiting the use of sentiment analysis on Twitter data in [29], while a survey of articles that focused on methods that touch up the predictions of stock market time series using financial news from Twitter, along with a discussion regarding the improvement of their performance by speeding up the computation, can be found in [30]. Given the above, the present work aspires to constitute a credible insight into the subject, specifically regarding the behavior of a large number of forecasting methods in light of their integration with sentiment analysis techniques.

3. Experimental Procedure

In the extensive series of experiments performed, a total of 27 algorithms were tested for their performance in relation to a corresponding multivariate dataset consisting, on the one hand, of the time series containing the daily closing values of each stock as a fixed input component and, on the other, of one of a plurality of 22 different sentiment score setups. A total of 16 initial datasets of stocks containing such closing price values from a period of three years, starting from 2 January 2018 to 24 December 2020, were used. Three different sentiment analysis methods were utilized to generate sentiment scores from linked textual data extracted from the Twitter microblogging platform. Moreover, a seven-day rolling mean strategy was applied to the sentiment scores, leading to six distinct time-dependent features. A number of 22 combinations, per algorithm, of distinct input components, from the calculated sentiment scores together with the closing values, were tested under the multivariate forecasting scheme. Thus, given the aforementioned number of features and setups, a total of 16 d a t a s e t s × 22 c o m b i n a t i o n s × 27 a l g o r i t h m × 3 s h i f t s = 28,512 experiments were performed.

3.1. Datasets

As already mentioned, 16 different initial datasets containing the time series of the closing values of sixteen well-known listed companies were used. All sets include data from the aforementioned three-year period, meaning dates starting from 2 January 2018 to 24 December 2020. Table 1 shows the names and abbreviations of all the shares used.
However, each of the above time series containing the closing prices of the shares was only one of the features of the final multivariate dataset. For each share, the final datasets were composed by introducing features derived from a sentiment analysis process, which was applied to an extended corpus of tweets related to each such stock. Figure 1 depicts a representation of the whole process, from data collection to the creation of the final sets. Below is a brief description of each stage of the final-dataset-construction process.

3.1.1. Raw Textual Data

First, a large number of—per stock—related posts were collected from Twitter and grouped per day. These text data include tweets written exclusively in English. Specifically, the tweets were downloaded using the Twitter Intelligence Tool (TWINT) [31], an easy-to-use Python-based Twitter scraper. TWINT is an advanced, standalone, yet relatively straightforward tool for downloading data from user profiles. With this tool, a thorough search for stock-related reports to be investigated—that is, tweets that were directly or indirectly linked to the share under consideration—resulted in a rather extensive body of text data, consisting of day-to-day views or attitudes towards stocks of interest. These collections were then preprocessed and moved to the sentiment quantification extraction modules.

3.1.2. Text Preprocessing

Next, the text-preprocessing step schematically presented in Figure 2 followed. Specifically, after the initial removal of irrelevant hyperlinks and URLs, using the re Python library [32], each tweet was converted to lowercase and split into words. A series of numerical strings and terms of no interest taken from a manually created set was then removed. Lastly, on the one hand—and after the necessary joins to bring each text to its initial structure—each tweet was tokenized according to its sentences using the NLTK [33,34] library, and on the other, using the string [35] module, targeted punctuation removal was applied.

3.1.3. Sentiment Scores

The next step involved generating the sentiment scores from the collected tweets. In this work, three distinct sentiment analysis methods, that is the sentiment modules from TextBlob [36], the Vader [37] Sentiment Analysis tool, and FinBERT [38], a financial-based fine-tuning of the BERT [39] language representation model, were used. For each of the above, and given the day-to-day sentiment scores extracted with the use of each one of them, a daily mean value formed the final collection of sequential and time-dependent instances that constituted the sentiment-valued time series of every corresponding method. It should be noted that, in addition to the three valuations extracted by the above procedures, a seven-day moving average scheme was also utilized as applied to the sentiment-valued time series. Thus, six distinct sentiment-valued time series were generated, the combinations of which, along with the no-sentiment and the univariate case scenario, led to the 22 different study cases. These, combined with the closing price data, constituted a single distinct experimental procedure for every algorithm. Below is a rough description of the three methods mentioned earlier:
  • TextBlob: TextBlob is a Python-based framework for manipulating textual data. In this work, using the sentiment property from the above library, the polarity score—that is, a real number within the [ 1 , 1 ] interval—was generated for every downloaded tweet. As has already been pointed out, a simple averaging scheme was then applied to the numerical output of the algorithm to produce a single sentiment value that represents the users’ attitude per day. The method, being a rule-based sentiment-analysis algorithm, works by calculating the value attributed to the corresponding sentiment score by simply applying a manually created set of rules. For example, counting the number of times a particular term appears in a given section adjusts the overall estimated sentiment score values in proportion to the way this term is evaluated;
  • Vader: Vader is also a simple rule-based method for general sentiment analysis realization. The Vader Sentiment Analysis tool in practice works as follows: given a string—in this work, the textual elements of each tweet—SentimentIntensityAnalyzer() returns a dictionary, containing negative, neutral, and positive sentiment values, and a compound score produced by a normalization of the three latter. Again, maintaining only the “compound” value for each tweet, a normalized average of all such scores was generated for each day, resulting in a final time series that had those—ranging within the [ 1 , 1 ] interval—daily sentiment scores as its values;
  • FinBERT: FinBERT is a sentiment analysis pre-trained natural-language-processing (NLP) model that is produced by fine-tuning the BERT model over financial textual data. BERT, standing for bidirectional encoder representations from transformers, is an architecture for NLP problems based on the transformers. Multi-layer deep representations of linguistic data are trained under a bidirectional attention strategy from unlabeled data in a way that the contexts of each token constitute the content of its embedding. Moreover, targeting specific tasks, the model can be fine-tuned using just another layer. In essence, it is a pre-trained representational model, according to the principles of transfer learning. Here, using the implementation contained in [40], and especially the model trained on the PhraseBank presented in [41], the daily sentiment scores were extracted, and—according to the same pattern as before—a daily average was produced.

3.2. Algorithms

Now, regarding the algorithms used, it was already reported that 27 different methods were compared. From this, it is easy to conclude that it is practically impossible to present in detail such a number of algorithms in terms of their theoretical properties. Instead, a simple reference is provided while encouraging the reader to consult the corresponding citations for further information. Table 2 contains alphabetically all the algorithms used during the experimental process.
Experiments were run in the Python programming language using the Keras [67] open-source software library and PyCaret [68,69], an open-source, low-code machine-learning framework. It should also be noted that the problem of predicting the future values of the given time series was essentially addressed and consequently formalized as a regression problem. The forecasts were exported under one single-step and two multi-step prediction scenarios. Specifically, regarding multi-step forecasts, estimates were predicted for a seven-day window, on the one hand, and a fourteen-day window, on the other. All algorithms tested were utilized in a basic configuration with no optimization process taking place whatsoever.

3.3. Metrics

Moving on to the prediction performance estimates, given the comparative nature of the present work, the forthcoming description of the evaluation metrics to be presented is be a little more detailed. The following six metrics were used: MSE, RMSE, RMSLE, MAE, MAPE, and R 2 . The abbreviations are defined within the following subsections. Specifically, below is a presentation of these metrics, along with some insight regarding their interpretation. In what follows, the actual values of the observations are denoted by y a i and the forecast values by y p i .

3.3.1. MSE

The mean squared error (MSE) is simply the average of the squares of the differences between the actual values and the predicted values.
MSE = 1 n i = 1 n y p i y a i 2
The square power ensures the absence of negative values while making small error information usable, i.e., minor deviations between the forecast and the actual values. It is evident, of course, that the greater the deviation of the predicted value from the actual one, the greater the penalty provided for under the MSE. A direct consequence of this is that the metric is greatly affected by the existence of outliers. Conversely, when the difference between the forecast and the actual value is less than one, the above interpretation works—in a sense—in reverse, resulting in an overestimation of the model’s predictive capacities. Because it is differentiable and can easily be optimized, the MSE constitutes a rather common forecast evaluation metric. It should be noted that the unit of measurement of the MSE is the square of the unit of measurement of the variable to predict.

3.3.2. RMSE

The RMSE seems almost as an extension of the MSE. To compute it, one just calculates the root of the above.
RMSE = 1 n i = 1 n y p i y a i 2
That is, in our case, this is the quadratic mean (root mean square) of the differences between forecasts and actual, previously observed values. The formalization gives a representation of the average distance of the actual values from the predicted ones. The latter becomes easier to understand if one ignores the denominator in the formula: we observe that the formula is the same as that of the Euclidean distance, so dividing by the number n of the observations results in the RMSE being considered as some normalized distance. As with the MSE, the RMSE is affected by the existence of outliers. An essential role in the interpretability and, consequently, in the use of the RMSE is played by the fact that it is expressed in the same units with the target variable and not in its square, as in the MSE. It should also be noted that this metric is scale-dependent and can only be used to compare forecast errors of different models or model variations for a particular specific given variable.

3.3.3. RMSLE

Below, in Equation (3), looking inside the square root, one notices that the RMSLE metric is a modified version of the MSE, a modification that is preferred in cases where the forecasts exhibit a significant deviation.
RMSLE = 1 n i = 1 n log ( y p i + 1 ) log ( y a i + 1 ) 2
As already mentioned, the MSE imposes a large “penalty” in cases where the forecast value deviates significantly from the actual value, a fact that the RMSLE compensates. As a result, this metric is resistant to the existence of both outliers, as well as noise. For this purpose, it utilizes the logarithms of the actual and the forecast value. The value of one is added to both the predicted and actual values in order to avoid cases where there is a logarithm of zero. It is straightforward that the RMSLE cannot be used when there exist negative values. Using the property: log ( y p i + 1 ) log ( y a i + 1 ) = l o g y p i + 1 y a i + 1 , it becomes clear that this metric actually works as the relative error between the actual value and the predicted value. It is worth noting that the RMSLE attributes more weight in cases where the predicted value is lower than the actual one than in cases where the forecast is higher than the observation. It is, therefore, particularly useful in certain types of forecasts (e.g., sales, where lower forecasts may lead to stock shortages if there is more than the projected demand).

3.3.4. MAE

The MAE is probably the most straightforward metric to calculate. It is the arithmetic mean of the absolute errors (where the “error” is the difference between the predicted value and the actual value), assuming that all of them have the same weight.
MAE = 1 n i = 1 n y p i y a i
The result is expressed (as in the RMSE) in the unit of measurement of the target variable. Regarding the existence of outliers, and given the absence of exponents in the formula, the MAE metric displays quite good behavior. Lastly, this metric—as the RMSE—depends on the scale of the observations. It can be used mainly to compare methods when predicting the same specific variable rather than different ones.

3.3.5. MAPE

The MAPE stands for mean absolute percentage error. This metric is quite common for calculating the accuracy of forecasts, as it represents a relative and not an absolute error measure.
MAPE = 1 n i = 1 n y p i y a i y a i
A percentage represents accuracy: In Equation (5), we observe that the MAPE is calculated as the average of the absolute differences of the prediction from the actual value, divided by the observation. A multiplication by 100 can then transform the output value as a percentage. The MAPE cannot be calculated when the actual value is equal to zero. Moreover, it should be noted that if the forecast values are much higher than the actual ones, then the MAPE may exceed the 100 % rate, while when both the prediction and the observation are low, it may not even approach 100 % , leading to the erroneous conclusion that the predictive capacities of the model are limited, when in fact the error values may be low (Although, in theory, the MAPE is a percentage of 100, in practice, it can take values in 0 , ). The way it is calculated also tends to give more weight in cases where the predicted value is higher than the observation, thus leading to more significant errors. Therefore, there is a preference for using this metric in methods with low prediction values. Its main advantage is that it is not scale-dependent, so it can be used to evaluate comparisons of different time series, unlike the metrics presented above.

3.3.6. R 2

Lastly, the coefficient of determination R 2 is the ratio of the variance of the estimated values of the dependent variable to the fluctuation of the actual values of the dependent variable.
R 2 = 1 S S R E S S S T O T = 1 i = 1 n y p i y a i 2 y p i y ¯ 2
This metric is a measure of good fitting, as it attempts to quantify how well the regression model fits the data. Therefore, it is essentially not a measure of the reliability of the model. Typically, the values of R 2 range from 0–1. The value of zero corresponds to the case where the explanatory variables do not explain the variance of the dependent variable at all, while the value of one corresponds to the case where the explanatory variables fully explain the dependent variable. In other words, the closer the value of R 2 is to one, the better the model fits the observations (historical data), meaning the forecast values will be closer to the actual ones. However, there are cases where the output of R 2 goes beyond the above range and takes negative values. In this case (which is one allowed by its calculation formula), we conclude that our model has a worse performance (where “performance” means “data fitting”) than the simple horizontal line; in other words, the model does not follow the data trend. Concluding, values outside the above range—i.e., either greater than one or less than zero—either suggest the unsuitability of the model or indicate other errors in its implementation, such as the use of meaningless constraints.

4. Results and Discussion

Moving on to the results, as was already pointed out, the purpose of this work was twofold. The aim was to investigate two separate case studies through an extensive experimental procedure. Below are the results of the experiments categorized into these two separate cases. The first section deals with the utilization of textual data in light of sentiment analysis for the task of time series forecasting and the investigation of whether or not and when their use has a beneficial effect on improving predictions. The second involves comparing the performance of different forecast algorithms, aiming to fill the corresponding gap in the literature, where although there is serious research effort, it mainly concerns the comparison of a small number of methods. Table A1 presents the 22 sentiment score scenarios along with their respective abbreviations.
Apparently, the large number of experiments make any attempt to present numerical results in their raw form, that is, in the form of individual exported numerical predictions, impossible. It was therefore deemed necessary to use some performance measures that are well known and, in some ways, established in similar comparisons and capture the general behavior of each scenario. Moreover, it was already mentioned that the time series forecasting problem can be considered a regression one, and we see that in the present research—which presupposes a thorough study of the problem—six commonly accepted metrics were used. The choice of a number of various metrics was considered a necessary one, as each of them has advantages and disadvantages, presenting different aspects of the results that form a diverse set of guides for their evaluation.
Regarding aggregate comparisons, the first way of monitoring results to draw valid general conclusions was by the exploitation of the Friedman ranking test [70]. Thus, on the one hand, the H0 hypothesis—that is, whether all 22 different scenarios produce similar results—would have been tested, and on the other, it would have been made possible to classify the methods based on their efficiency. The Friedman statistical test is a non-parametric statistical test that checks whether the mean values of three or more treatments—in our case, the results of the twenty-two scenarios—differ significantly. Of the total six metrics used, five involved errors (MSE, RMSE, RMSLE, MAE, MAPE), which means that in order for one approach to be considered better than another, it must have a lower average. Therefore, the Friedman ranking error results follow an increasing order; the smaller the Friedman ranking score, the more efficient the method is. The opposite is the case only with R 2 , where higher values indicate better performance.
After the Friedman test was performed, in case the null hypothesis was rejected—this rejection means that there is even one method that behaves differently—then the Bonferroni–Dunn post hoc test [71], also known as the Bonferroni inequality procedure, followed. This test generally reveals which pairs of treatments differ in their mean values, acting as follows: first the critical difference value is extracted, and then, for each pair of treatments, the absolute value of the difference in their rankings is calculated. If the latter is greater than or equal to the critical difference value, H0 is rejected, i.e., the corresponding treatments differ. The most efficient way to present the results of the Bonferroni inequality procedure is through CD-diagrams, where treatments whose performances do not differ are joined by horizontal dark lines. Below are tables with the results of the Friedman tests, boxplots with the error distributions, as well as CD-diagrams, which, due to the limited space available, show the relations between the top-10 best approaches according to the Friedman rankings.

4.1. Case Study: Sentiment Scores’ Comparison

Let us initially give a summary of the case. First, the aim was to answer whether and under what conditions the use of sentiment analysis in data derived from social media has a positive effect on the prediction of future prices of financial time series. Here, the combinations—seen in Table A1—of scores from three different sentiment analysis methods together with their seven-day rolling means and the univariate case created a total of twenty-two cases to compare. Table A2, Table A3 and Table A4 present the final Friedman rankings in terms of their corresponding single-day, seven-day and fourteen-day forecasts.

4.1.1. Single-Day Prediction

First, regarding the forecast for the next day only, Table A2 shows the general superiority of the univariate case over the use of sentiment analysis. As for the boxplots and CD-diagrams, the top-ten combinations of sentiment time series for each metric presented are ranked with the same performance dominance of the univariate scenario (note that in boxplots, the top-down layout is sorted by median).
One can also observe the statistical dependencies that emerged from the examination of each pair of cases. These dependencies can be further analyzed by comparing Table A2 with the representations in Figure 3. For example, it was observed that the statistical dependence of the univariate case with that of the additional use of TextBlob shown in Figure 3 followed the ranking of the two versions extracted from the results in the Friedman tables. Figure 4 shows the performance distributions for each sentiment setup, i.e., all the values that resulted from applying a given setting to each dataset for each algorithm. Here, the apparent similarity of the performances of the methods is, on the one hand, a matter of the scale of the representation, while on the other, it reflects a possible uniformity. From all three different representations of the results, there was a predominance of the univariate version followed by the use of TextBlob and FinBERT.

4.1.2. One-Week Prediction

However, in the case of weekly forecasts, one can observe, from Table A3 and Figure 5 and Figure 6, that things do not remain the same. There was a noticeable decline in the performance ranking of the univariate setup, with the simultaneous improvement of configurations that utilize sentiment scores.
In particular, in four of the measurements used, FinBERT seemed to be superior, while in the other two, the combination of FinBERT with TextBlob lied in the first place of the ranking. Apart from that, Vader, Blob, and the combination of Vader and FinBERT seemed to perform almost equal to the above, as the differences in their corresponding rankings were minimal. In addition, regarding the use of rolling means, there seemed to be no particular improvement under the current framework except—in rare cases—when applied in combination with the use of a raw sentiment score. The only one of the representations of the results where the univariate configuration is presented in high positions is via boxplots, where the sorting of the layout is only based on the median of the values. In terms of Friedman scores, at best, it ranked sixth.

4.1.3. Two-Week Prediction

Results from the fourteen-day forecasts exhibited similar behavior as in the seven-day prediction case, except for the performance of the averaging schemes, some of which tended to move up to higher positions. Indeed, here, again, Friedman’s ranking in all evaluations seemed to suggest that the use of information extracted from social networks is beneficial under the current forecasting framework. In addition, there was an apparent improvement in schemes exploiting rolling means. This becomes easily noticeable in both Figure 7 and Figure 8, showing the CD-diagrams and boxplots, respectively, and in Table A4. One can observe the configuration of TextBlob that incorporates the weekly rolling mean to be in the first place of the Friedman ranking in terms of three valuations, that is in terms of the RMSE, MAE, and MAPE metrics. Thus, apart from the conclusions that can be drawn from the study of the representations of the results and that constitute evaluations similar in form to those of the above cases, something new seemed to emerge here: there was a gradual increase in the performance of the combinations that use weighted information. Moreover, this increase in performance seemed to be related to the long forecast period.

4.2. Case Study: Methods’ Comparison

We can now turn to the presentation of the results of the comparison of the algorithms. The reader is first asked to refer to Table 2, containing the methods with their respective abbreviations, as well as to Table A5, Table A6 and Table A7, containing the Friedman rankings. The Friedman rankings here are structured as a generalization derived from the performance of each algorithm in terms of each dataset and under each of the 22 input schemes.

4.2.1. One-Day Prediction

Starting with the simple one-day prediction, from the results presented in Table A5 and in Figure 9 and Figure 10, one can easily conclude an almost universal predominance of LSTM methods.
Regarding the three best-performing methods, the CD-diagrams show a statistical dependence between the LSTM and Bi-LSTM methods, while the scheme incorporating both the above algorithmic processes in a stacked configuration is presented as statistically independent of all. This supposed independence, and according to what has been reported about how these diagrams are derived, can easily be identified in the differences in the results of the Friedman table, where the deviations between the methods are significant.
The latter is eminent in the boxplots as well. Both the dispersion and the values of the evaluations of the top-three methods stand out clearly from those of all the other techniques.

4.2.2. One-Week Prediction

It can be observed that the same interpretation applies in the case of weekly forecasts. Again, in all metrics, the top-three best-performing methods were the three LSTM variants (Figure 11). Table A6 depicts both the latter and the distinctions presented on the CD-diagrams of Figure 12. Essentially, however, a simple comparison of the representations of the results showed that in all cases, the predominant methods were by far the LSTM and Bi-LSTM procedures.
In the boxplots, despite the fact that the LSTM variants appear as if they tend to form a group of similarly performing methods, the Friedman scores point to the independence—in terms of the evaluation of numerical outputs—of only the top-two aforementioned methods from all the others. Thus, based on these results, it is relatively easy to suggest a clear choice of strategy in terms of methods.

4.2.3. Two-Week Prediction

Finally, regarding the case of the 14-day forecasts, the general remarks given in the previous section can be extended here as well. The results can be found below, in Figure 13 and Figure 14, as well as in Table A7.
An additional final remark, however, should be the following: in the boxplots, in the results of the R 2 , there seems to be a difference in the median ranking. This ranking, however, was not found in the case of the Friedman scores.

4.3. Discussion

Having presented the results, below are some general remarks. Here, the following discussion is structured according to the bilateral distinction of the case studies presented and contains summarizing comments regarding elements that preceded:
  • Sentiment setups: The main point that emerged from the above results has to do with the fact that the use of sentiment analysis seemed to improve the models when used for long-term predictions. Thus, while the use of the univariate configuration is seen as more efficient in one-step predictions, when the predictions applied to the seven-day and fourteen-day cases, the use of sentiment scores under a multivariate topology seemed to improve the forecasts overall. Specifically, in the weekly forecasts, all three single-sentiment-score setups outperformed the use of the univariate configuration, with FinBERT performing best in terms of the MSE, RMSE, RMSLE, and R 2 , while the combination of Blob and FinBERT outperformed the rest in the MAE and MAPE. When the prediction shift doubles to 14 days, one notices that Blob and Rolling Mean 7 Blob dominated the other sentiment configurations, followed by the combination of Blob and FinBERT, as well as FinBERT. Vader appeared to rank lower in all metrics and was, therefore, weaker than in the previous two cases.
    However, two general questions need to and can be answered by looking at the results. These are not about choosing an algorithm, as one can assume that in a working scenario where reliable predictions would be needed, one would have a number of methods at one’s disposal. Thus, this is a query about a reliable methodology. Therefore, first of all, one should evaluate whether the use of sentiment scores helps and, if so, in which cases. Second, an answer must be provided as to what form the sentiment score time series should have depending on the forecasting case. Regarding the first question, the answer seems to be clear: multivariate configurations improve forecasts in non-trivial forecast cases. As for the second one, it seems that, in cases of long-term forecasts, an argument in favor of the use of rolling mean can be substantiated. Concluding, it should be noted that when the forecast window grows, then even seemingly small improvements, such as those seen through the use of sentiment analysis, can be of particular importance;
  • Algorithms: As for the algorithms, the comparisons seemed to provide direct and clear interpretations. From the results here, it is also possible to safely substantiate—at least—a central conclusion. It is apparent that in all scenarios, the configurations exploiting neural networks—that is, LSTM variations—were superior in terms of performance to the classical regression algorithms. Among them, LSTM outperformed the BiLSTM architecture in every single case, while the stacked combination of the two followed. In addition, the aforementioned superiority of the two dominant methods was clear, with their performance forming a threshold, below which—and at a considerable distance—all the other methods examined were placed. Therefore, concluding, if one considers that the neural network architectures used did not contain sophisticated configurations—in terms of, for example, depth—then, on the basis that any additional computational costs become negligible, the use of LSTMs constitutes the clear choice.

5. Conclusions

In this work, a study of the exploitation of sentiment scores in various multivariate time-series-forecasting schemes regarding financial data was conducted. The overall structure and results of an extensive experimental procedure were presented, in which 22 different input configurations were tested, utilizing information extracted from social networks, in a total of 16 different datasets, using 27 different algorithms. The survey consisted of two case studies, the first of which was to investigate the performance of various multivariate time series forecasting schemes utilizing sentiment analysis and the second to compare the performance of a large number of machine-learning algorithms using the aforementioned multivariate input setups.
From the results, and in relation to the first case study, that is, after the use of sentiment analysis configurations, a conditional performance improvement can be safely deduced in cases where the methods were applied to predict long-term time frames. Of all the sentiment score combinations tested, the TextBlob and FinBERT variations generally appeared to perform best. In addition, there was a gradual improvement in the performance of combinations containing rolling averages as the forecast window grew. This may imply that a broader study of the use of different versions of the same time series in a range of different multivariate configurations may reveal methodological strategies as to how to exploit input data manipulations to increase accuracy.
Regarding the second case study, the results indicated a clear predominance of LSTM variations. In particular, this superiority became even clearer in terms of its generalization when the basic configurations of the architectures used in the neural networks under consideration were taken into account, which means that any computational cost cannot be a counterweight to the dominance of the LSTM methods.

Author Contributions

Conceptualization, S.K.; methodology, C.M.L.; software, C.M.L.; validation, C.M.L. and A.K.; formal analysis, C.M.L. and A.K.; investigation, C.M.L. and A.K.; resources, A.K. and S.K.; data curation, A.K.; writing—original draft preparation, C.M.L.; writing—review and editing, C.M.L.; visualization, C.M.L., A.K.; supervision, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

URL of the full Friedman Ranking results: https://bit.ly/2XlBNvL (accessed on 1 November 2021).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Friedman Rankings

Some of the results in the tables below are slightly truncated due to space limitations. The full results of the Friedman rankings can be found at the following URL: https://bit.ly/2XlBNvL (accessed on 1 November 2021).

Appendix A.1. Sentiments

Table A1. Sentiment score setups.
Table A1. Sentiment score setups.
No.AbbreviationSentiment Score Setup
1NSNo Sentiment
2BTextBlob
3VVader
4FFinBERT
5B7Rolling Mean 7 TextBlob
6V7Rolling Mean 7 Vader
7F7Rolling Mean 7 FinBERT
8BVTextBlob and Vader
9BFTextBlob and FinBERT
10BB7TextBlob and Rolling Mean 7 TextBlob
11BV7TextBlob and Rolling Mean 7 Vader
12BF7TextBlob and Rolling Mean 7 FinBERT
13VFVader and FinBERT
14VB7Vader and Rolling Mean 7 TextBlob
15VV7Vader and Rolling Mean 7 Vader
16VF7Vader and Rolling Mean 7 FinBERT
17FB7FinBERT and Rolling Mean 7 TextBlob
18FV7FinBERT and Rolling Mean 7 Vader
19FF7FinBERT and Rolling Mean 7 FinBERT
20B7V7Rolling Mean 7 TextBlob and Rolling Mean 7 Vader
21B7F7Rolling Mean 7 TextBlob and Rolling Mean 7 FinBERT
22V7F7Rolling Mean 7 Vader and Rolling Mean 7 FinBERT
Table A2. Sentiment scenarios’ Friedman rankings (shift = 1).
Table A2. Sentiment scenarios’ Friedman rankings (shift = 1).
MSERMSERMSLE
SetupF-RankSetupF-RankSetupF-Rank
1NS9.445601852NS9.440972222NS9.420138889
2B10.35763889B10.35763889B9.710648148
3F10.3576389F10.3576389F10.30092593
4BB710.73263889BB710.73032407V10.64351852
5B710.74305556B710.74537037BB710.77546296
6V10.81018519V10.77546296B710.81365741
7BV11.22569444BV11.19560185BV10.99768519
8V711.35416667V711.33564815BF11.28356481
9BF11.40740741FB711.42013889VF11.44212963
10FB711.42361111BF11.44444444FB711.48842593
11VB711.43634259VB711.4525463V711.50694444
12VF11.48148148VF11.50231481VB711.57060185
13VV711.66087963VV711.66550926F711.66782407
14F711.76967593F711.77199074FF711.78125
15FF711.84490741FF711.83449074VV711.92013889
16BV712.01967593BV711.97800926BV712.18402778
17BF712.15740741BF712.2037037BF712.18865741
18VF712.28009259VF712.2662037VF712.21527778
19FV712.46759259FV712.51157407B7V712.61689815
20B7F712.78587963B7F712.76967593FV712.7337963
21V7F712.85532407B7V712.85763889B7F712.78009259
22B7V712.86689815V7F712.86226852V7F712.95833333
MAEMAPE R 2
SetupF-RankSetupF-RankSetupF-Rank
1NS9.591435185NS9.503472222NS13.55208333
2B9.688657407B9.616898148B13.12615741
3F10.16782407F10.05671296F12.6400463
4V10.79050926V10.72337963BB712.26736111
5B710.82175926B710.73842593B712.25810185
6BB710.85532407BB710.78356481V12.19097222
7BV10.8599537BV10.79398148BV11.77430556
8BF11.21759259BF11.19907407V711.64930556
9V711.32986111V711.36574074BF11.59259259
10FB711.38773148FB711.37615741FB711.57638889
11VF11.48611111VF11.5VB711.56365741
12VB711.6400463VB711.66898148VF11.51851852
13F711.69791667FF711.72569444VV711.33680556
14FF711.69791667F711.80324074F711.22916667
15VV711.87847222VV711.92476852FF711.15625
16BV712.11689815BV712.1875BV710.98032407
17BF712.19212963BF712.28240741BF710.84259259
18VF712.34027778VF712.28240741VF710.72106481
19FV712.35648148FV712.42013889FV710.53240741
20B7F712.72337963B7F712.64699074B7F710.21412037
21V7F713.01041667V7F713.09953704V7F710.14467593
22B7V713.14930556B7V713.17592593B7V710.13310185
Table A3. Sentiment scenarios’ Friedman rankings (shift = 7).
Table A3. Sentiment scenarios’ Friedman rankings (shift = 7).
MSERMSERMSLE
SetupF-RankSetupF-RankSetupF-Rank
1F10.45486111F10.46875F10.44444444
2BF10.5162037BF10.53009259BF10.46412037
3V10.64930556V10.62847222V10.75231481
4VF10.90509259VF10.91203704VF10.78356481
5B10.9224537B10.91782407B10.84953704
6NS11.06828704NS11.09375NS10.96064815
7BV11.19907407BV11.18518519BV11.14583333
8B711.23263889B711.23958333B711.16087963
9FV711.34143519FV711.36689815BF711.30902778
10VV711.3900463VV711.40162037BB711.42476852
11FF711.52199074FF711.52083333FF711.44444444
12BF711.54398148BF711.52314815VB711.47685185
13BB711.5625BB711.54398148FB711.52430556
14FB711.6087963FB711.62384259VV711.67708333
15BV711.71064815BV711.69907407VF711.72222222
16VB711.73958333VB711.73958333FV711.74421296
17V711.76967593V711.7650463BV711.89814815
18VF711.87847222VF711.85300926F712.16782407
19F712.1412037F712.14583333V712.23842593
20B7V712.54861111B7V712.55555556B7V712.52546296
21V7F712.64236111V7F712.62847222B7F712.56944444
22B7F712.65277778B7F712.65740741V7F712.71643519
MAEMAPE R 2
SetupF-RankSetupF-RankSetupF-Rank
1BF10.45949074BF10.38078704F12.54513889
2B10.67939815B10.67824074BF12.4849537
3F10.68634259V10.70023148V12.35069444
4V10.71064815F10.73958333VF12.09490741
5B710.85532407B710.85300926B12.0775463
6BB710.87847222BV10.86921296NS11.93171296
7BV10.88310185VF10.90162037BV11.80092593
8VF10.9849537BB710.92476852B711.76736111
9NS11.02893519NS11.00925926FV711.65972222
10VB711.1875VB711.03472222VV711.6099537
11FB711.42013889FB711.42708333FF711.47800926
12BF711.62037037BF711.52314815BF711.4537037
13VV711.74189815VV711.7650463BB711.43865741
14FF711.77314815FF711.78009259FB711.3912037
15FV711.85300926VF711.90046296BV711.28587963
16VF711.90972222FV712.0474537VB711.26041667
17BV711.91319444BV712.08101852V711.23032407
18V712.17592593V712.13078704VF711.12152778
19F712.26388889F712.22685185F710.8587963
20B7F712.5B7F712.53935185B7V710.4525463
21B7V712.63310185B7V712.54166667V7F710.3587963
22V7F712.84143519V7F712.94560185B7F710.34722222
Table A4. Sentiment scenarios’ Friedman rankings (shift = 14).
Table A4. Sentiment scenarios’ Friedman rankings (shift = 14).
MSERMSERMSLE
SetupF-RankSetupF-RankSetupF-Rank
1B10.48726852B710.50810185BF10.46527778
2B710.50462963B10.52777778B10.53125
3BF10.56597222BF10.55208333B710.60300926
4F10.70949074F10.71990741F10.66550926
5V10.75925926V10.74884259BB710.70717593
6BB710.7974537BB710.80208333V10.92476852
7NS10.92592593NS10.89583333FB710.92939815
8FB711.10185185FB711.09027778NS11.04976852
9VB711.25694444VB711.28240741VB711.41319444
10BV11.3275463BV11.3275463VF11.45601852
11VF11.37615741VF11.37152778FF711.46296296
12VV711.52777778VV711.52083333BV11.46990741
13V711.70023148V711.71064815VV711.75578704
14FF711.75925926FF711.75231481BF711.78240741
15BV711.84722222BV711.84143519BV711.87268519
16BF712.00115741BF711.99652778F711.92708333
17B7F712.01851852B7F712.02083333V711.94560185
18B7V712.05092593FV712.05671296FV712.12847222
19FV712.0625B7V712.06597222B7F712.15509259
20F712.30555556F712.30671296VF712.21180556
21VF712.47106481VF712.45486111B7V712.27546296
22V7F713.44328704V7F713.44675926V7F713.26736111
MAEMAPE R 2
SetupF-RankSetupF-RankSetupF-Rank
1B710.51157407B710.50694444B12.50925926
2B10.53125B10.60532407B712.49652778
3BF10.59490741BB710.64351852BF12.43171296
4BB710.62962963BF10.70601852F12.29166667
5F10.69328704F10.74189815V12.24074074
6V10.79513889NS10.88425926BB712.20717593
7NS10.94907407V10.93634259NS12.07407407
8FB710.99421296FB711.03356481FB711.89583333
9BV11.41087963VV711.44212963VB711.74305556
10VV711.42824074FF711.47337963BV11.6724537
11V711.43055556VB711.49537037VF11.62384259
12VB711.44675926V711.50925926VV711.47222222
13VF11.59375VF11.66319444V711.29976852
14BV711.6400463BV11.68634259FF711.23842593
15FF711.67476852FV711.80208333BV711.15277778
16FV711.80902778BF711.81944444BF711.00347222
17BF712.00231481BV711.86458333B7F710.97916667
18B7F712.17013889B7F712.10763889B7V710.94907407
19F712.39351852F712.14583333FV710.9375
20B7V712.46643519VF712.36689815F710.69560185
21VF712.52662037B7V712.53587963VF710.52893519
22V7F713.30787037V7F713.03009259V7F79.556712963

Appendix A.2. Algorithms

Table A5. Methods’ Friedman rankings (shift = 1).
Table A5. Methods’ Friedman rankings (shift = 1).
MSERMSERMSLE
MethodF-RankMethodF-RankMethodF-Rank
1LSTM3.409090909LSTM3.184659091LSTM3.178977273
2LSTM_23.954545455LSTM_23.786931818LSTM_23.801136364
3LSTM_36.34375LSTM_35.977272727LSTM_36.065340909
4GB9.048295455GB9.076704545GB9.105113636
5LGBM9.923295455LGBM9.96875LGBM10.10511364
6ET10.17045455ET10.22443182ET10.625
7RF10.64488636RF10.6875RF10.87215909
8MLP11.13636364MLP11.17897727MLP10.95738636
9CBR11.23863636CBR11.25852273CBR11.47443182
10XGBoost12.26420455XGBoost12.30397727XGBoost12.63352273
11ARD13.11647727ARD13.16761364ARD13.5625
12OMP13.21164773OMP13.26846591OMP13.70596591
13LA13.66619318LA13.71164773LA14.18323864
14RDG13.77272727RDG13.81818182RDG14.21875
15LNR13.78267045LNR13.828125LNR14.29829545
16ABR14.68465909ABR14.71022727ABR15.11079545
17DTR15.34090909DTR15.36079545DTR15.64772727
18KNR16.19034091KNR16.21022727KNR15.99431818
19RSC16.31676136RSC16.35085227RSC16.50568182
20HBR16.91477273HBR16.96022727HBR17.30397727
21SVR17.85511364SVR17.86079545THS17.51704545
22THS18.11647727LAS18.13068182SVR17.63068182
MAEMAPE R 2
MethodF-RankMethodF-RankMethodF-Rank
1LSTM3.113636364LSTM2.960227273LSTM24.34375
2LSTM_23.835227273LSTM_23.747159091LSTM_223.79545455
3LSTM_36.423295455LSTM_36.346590909LSTM_321.36931818
4GB8.448863636GB8.360795455GB19.00568182
5ET9.889204545ET10.01988636LGBM18.15340909
6LGBM9.997159091LGBM10.21875ET17.84943182
7RF10.24431818RF10.34943182RF17.38636364
8CBR10.69318182MLP10.78409091MLP16.88636364
9MLP10.81818182CBR10.80681818CBR16.76988636
10XGBoost12.02840909XGBoost12.11079545XGBoost15.75568182
11ARD13.83522727ARD13.75568182ARD14.94602273
12OMP13.96164773OMP13.87642045OMP14.84517045
13LA14.20880682LA14.140625LA14.40198864
14RDG14.3125RDG14.28125RDG14.29545455
15LNR14.34943182LNR14.29119318LNR14.28551136
16ABR14.67613636ABR14.69318182ABR13.33806818
17DTR14.94034091DTR15.05113636DTR12.66477273
18KNR15.53693182KNR15.36931818KNR11.85227273
19SVR15.73579545SVR15.84375RSC11.73153409
20RSC16.81534091RSC16.77982955HBR11.13352273
21HBR17.45170455HBR17.44034091SVR10.17045455
22THS18.65340909LAS18.62215909THS9.909090909
Table A6. Methods’ Friedman rankings (shift = 7).
Table A6. Methods’ Friedman rankings (shift = 7).
MSERMSERMSLE
MethodF-RankMethodF-RankMethodF-Rank
1LSTM4.940340909LSTM4.852272727LSTM4.491477273
2LSTM_25.272727273LSTM_25.235795455LSTM_24.823863636
3LSTM_38.241477273LSTM_38.005681818LSTM_37.696022727
4OMP10.31960227OMP10.33664773MLP9.9375
5ARD10.60511364ARD10.61931818OMP11.07528409
6MLP10.61931818MLP10.64488636ARD11.38068182
7LA10.96732955LA10.98153409LA11.63778409
8LNR11.26988636LNR11.28409091LNR11.98295455
9RDG11.34375RDG11.35795455GB11.98863636
10GB12.14772727GB12.17329545RDG12.05397727
11LGBM12.72443182LGBM12.75LGBM12.91193182
12ET13.65909091ET13.68181818ABR13.69034091
13HBR13.67613636HBR13.69318182ET13.88920455
14CBR13.90909091CBR13.90909091CBR13.89488636
15ABR14.25284091ABR14.27272727RF14.48295455
16RF14.38920455RF14.40056818THS14.59232955
17THS14.44886364THS14.46306818HBR14.70454545
18RSC14.94602273RSC14.97159091LAS15.48295455
19LAS16.10795455LAS16.13636364RSC15.56107955
20KNR16.19602273KNR16.21590909KNR16.00568182
21XGBoost16.25284091XGBoost16.26136364XGBoost16.63068182
22DTR17.75568182DTR17.76988636SVR17.07386364
MAEMAPE R 2
MethodF-RankMethodF-RankMethodF-Rank
1LSTM3.900568182LSTM4.022727273LSTM22.47443182
2LSTM_24.340909091LSTM_24.514204545LSTM_222.20454545
3LSTM_37.414772727LSTM_37.400568182LSTM_318.99715909
4MLP9.636363636MLP9.769886364OMP17.66903409
5OMP11.16619318GB11.14772727MLP17.59090909
6GB11.26136364OMP11.49857955ARD17.38636364
7ARD11.66761364LGBM11.84659091LA17.03267045
8LA11.88210227ARD11.94602273LNR16.73011364
9LGBM12.06818182LA12.14914773RDG16.65056818
10LNR12.25568182LNR12.53409091GB15.98579545
11RDG12.44602273RDG12.67045455LGBM15.40340909
12CBR12.99715909CBR12.95738636ET14.50568182
13ET13.21306818ET12.97159091HBR14.29545455
14ABR13.60795455ABR13.26704545CBR14.26704545
15RF13.73295455RF13.31534091ABR13.84943182
16HBR14.91477273KNR15.00852273RF13.82954545
17KNR15.39204545HBR15.18465909THS13.51136364
18THS15.39204545XGBoost15.34943182RSC13.13068182
19XGBoost15.57954545THS15.59090909KNR11.96590909
20RSC15.96875SVR16.13636364XGBoost11.94034091
21SVR16.26420455RSC16.21590909LAS11.90625
22LAS17.42613636LAS17.22727273SVR10.32102273
Table A7. Methods’ Friedman rankings (shift = 14).
Table A7. Methods’ Friedman rankings (shift = 14).
MSERMSERMSLE
MethodF-RankMethodF-RankMethodF-Rank
1LSTM5.488636364LSTM5.363636364LSTM4.946022727
2LSTM_25.721590909LSTM_25.588068182LSTM_25.113636364
3OMP8.673295455LSTM_38.443181818LSTM_38.048295455
4LSTM_38.678977273OMP8.701704545MLP9.488636364
5ARD9.116477273ARD9.15625OMP9.53125
6LA10.06107955LA10.09232955ARD9.840909091
7MLP10.18465909MLP10.20170455LA10.64914773
8LNR10.20880682LNR10.24005682LNR10.890625
9RDG10.29829545RDG10.32954545RDG11.01988636
10HBR11.86363636HBR11.88920455HBR12.58522727
11ABR13.26420455ABR13.29829545ABR13.15909091
12RSC13.46306818RSC13.48579545THS13.72301136
13THS13.53693182THS13.55113636LAS13.75
14GB13.98011364GB13.99431818RSC13.78267045
15LAS14.42329545LAS14.47443182GB14.16761364
16LGBM15.03977273LGBM15.06534091LGBM15.36079545
17RF16.18465909RF16.20170455ET16.46022727
18ET16.20170455CBR16.21590909CBR16.54261364
19CBR16.20454545ET16.22443182RF16.5625
20KNR16.54545455KNR16.57102273KNR16.61079545
21SVR17.40340909SVR17.41761364SVR17.04829545
22XGBoost17.73011364XGBoost17.75ELN17.54261364
MAEMAPE R 2
MethodF-RankMethodF-RankMethodF-Rank
1LSTM5.539772727LSTM5.15625LSTM20.69318182
2LSTM_25.732954545LSTM_25.321022727LSTM_220.46875
3LSTM_38.784090909LSTM_38.457386364OMP19.64204545
4OMP8.877840909MLP9.170454545ARD19.16193182
5ARD9.176136364OMP9.363636364LA18.21164773
6MLP9.295454545ARD9.653409091LNR18.06392045
7LA10.08664773LA10.43892045MLP18.04829545
8LNR10.27414773LNR10.640625RDG17.97159091
9RDG10.41761364RDG10.81818182LSTM_317.69602273
10HBR11.94318182HBR12.39488636HBR16.48295455
11ABR13.18465909ABR12.83806818ABR14.87215909
12THS13.73863636GB13.61931818RSC14.76704545
13GB13.76988636LGBM14.16477273THS14.5
14RSC13.82102273RSC14.24431818GB14.24715909
15LGBM14.39204545THS14.28409091LAS13.97727273
16CBR15.61363636CBR15.29545455LGBM13.19034091
17ET15.76704545ET15.42613636RF12.11363636
18RF15.84090909RF15.44886364ET12.07670455
19LAS15.97159091KNR15.81534091CBR12.01704545
20KNR15.97443182LAS16.21875KNR11.71022727
21SVR16.46022727SVR16.35795455SVR10.80965909
22XGBoost17.72159091XGBoost17.42045455XGBoost10.55397727

References

  1. Wei, W.W.S. Time Series Analysis Univariate and Multivariate Methods; Pearson Addison Wesley: Boston, MA, USA, 2018. [Google Scholar]
  2. Hong, W.-C. Rainfall Forecasting by Technological Machine Learning Models. Appl. Math. Comput. 2008, 200, 41–57. [Google Scholar] [CrossRef]
  3. Chukwudike, N.; Ugoala, C.B.; Maxwell, O.; Okezie, U.-I.; Bright, O.; Henry, U. Forecasting Monthly Prices of Gold Using Artificial Neural Network. J. Stat. Econom. Methods 2020, 9, 19–28. [Google Scholar]
  4. Liu, H.; Long, Z. An Improved Deep Learning Model for Predicting Stock Market Price Time Series. Digit. Signal Process. 2020, 102, 102741. [Google Scholar] [CrossRef]
  5. Liapis, C.M.; Karanikola, A.; Kotsiantis, S. An Ensemble Forecasting Method Using Univariate Time Series COVID-19 Data. ACM Int. Conf. Proc. Ser. 2020, 50–52. [Google Scholar] [CrossRef]
  6. Shahid, F.; Zameer, A.; Muneeb, M. Predictions for COVID-19 with Deep Learning Models of LSTM, GRU and Bi-LSTM. Chaos Solit. Fract. 2020, 140, 110212. [Google Scholar] [CrossRef] [PubMed]
  7. Khemchandani, R.; Chandra, S. Regularized Least Squares Fuzzy Support Vector Regression for Financial Time Series Forecasting. Expert Syst. Appl. 2009, 36, 132–138. [Google Scholar] [CrossRef]
  8. Ban, T.; Zhang, R.; Pang, S.; Sarrafzadeh, A.; Inoue, D. Referential KNN Regression for Financial Time Series Forecasting. Lect. Notes Comput. Sci. 2013, 8226, 601–608. [Google Scholar] [CrossRef]
  9. Sagheer, A.; Kotb, M. Time Series Forecasting of Petroleum Production Using Deep LSTM Recurrent Networks. Neurocomputing 2019, 323, 203–213. [Google Scholar] [CrossRef]
  10. Alhussein, M.; Aurangzeb, K.; Haider, S.I. Hybrid CNN-LSTM Model for Short-Term Individual Household Load Forecasting. IEEE Access 2020, 8, 180544–180557. [Google Scholar] [CrossRef]
  11. Karanikola, A.; Liapis, C.M.; Kotsiantis, S. A Comparison of Contemporary Methods on Univariate Time Series Forecasting. In Advances in Machine Learning/Deep Learning-Based Technologies; Springer: Berlin, Germany, 2022; pp. 143–168. [Google Scholar]
  12. Kazmaier, J.; van Vuuren, J.H. A Generic Framework for Sentiment Analysis: Leveraging Opinion-Bearing Data to Inform Decision Making. Decis. Support Syst. 2020, 135, 113304. [Google Scholar] [CrossRef]
  13. Li, L.; Goh, T.T.; Jin, D. How Textual Quality of Online Reviews Affect Classification Performance: A Case of Deep Learning Sentiment Analysis. Neural Comput. Appl. 2020, 32, 4387–4415. [Google Scholar] [CrossRef]
  14. Zhang, L.; Zhang, L.; Xiao, K.; Liu, Q. Forecasting Price Shocks with Social Attention and Sentiment Analysis. In Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, USA, 18–21 August 2016. [Google Scholar]
  15. Kedar, S.V. Stock Market Increase and Decrease Using Twitter Sentiment Analysis and ARIMA Model. Turkish J. Comput. Math. Educ. 2021, 12, 146–161. [Google Scholar] [CrossRef]
  16. Huang, J.Y.; Liu, J.H. Using Social Media Mining Technology to Improve Stock Price Forecast Accuracy. J. Forecast. 2020, 39, 104–116. [Google Scholar] [CrossRef]
  17. Shi, Y.; Zheng, Y.; Guo, K.; Ren, X. Stock Movement Prediction with Sentiment Analysis Based on Deep Learning Networks. Concurr. Comput. 2021, 33, 1–16. [Google Scholar] [CrossRef]
  18. Pano, T.; Kashef, R. A Complete Vader-Based Sentiment Analysis of Bitcoin (BTC) Tweets during the ERA of COVID-19. Big Data Cogn. Comput. 2020, 4, 33. [Google Scholar] [CrossRef]
  19. Wang, Y. Stock Market Forecasting with Financial Micro-Blog Based on Sentiment and Time Series Analysis. J. Shanghai Jiaotong Univ. 2017, 22, 173–179. [Google Scholar] [CrossRef]
  20. Bharathi, S.; Geetha, A. Sentiment Analysis for Effective Stock Market Prediction. Int. J. Intell. Eng. Syst. 2017, 10, 146–154. [Google Scholar] [CrossRef]
  21. Barman, A. Time Series Analysis and Forecasting of COVID-19 Cases Using LSTM and ARIMA Models. arXiv 2020, arXiv:2006.13852. [Google Scholar]
  22. Lara-Benítez, P.; Carranza-García, M.; Riquelme, J.C. An Experimental Review on Deep Learning Architectures for Time Series Forecasting. Int. J. Neural Syst. 2021, 31. [Google Scholar] [CrossRef]
  23. Jin, Z.; Yang, Y.; Liu, Y. Stock Closing Price Prediction Based on Sentiment Analysis and LSTM. Neural Comput. Appl. 2020, 32, 9713–9729. [Google Scholar] [CrossRef]
  24. Zhang, G.; Xu, L.; Xue, Y. Model and Forecast Stock Market Behavior Integrating Investor Sentiment Analysis and Transaction Data. Cluster Comput. 2017, 20, 789–803. [Google Scholar] [CrossRef]
  25. Kaushik, S.; Choudhury, A.; Sheron, P.K.; Dasgupta, N.; Natarajan, S.; Pickett, L.A.; Dutt, V. AI in Healthcare: Time-Series Forecasting Using Statistical, Neural, and Ensemble Architectures. Front. Big Data 2020, 3, 4. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Zhang, G.; Guo, J. A Novel Ensemble Method for Hourly Residential Electricity Consumption Forecasting by Imaging Time Series. Energy 2020, 203, 117858. [Google Scholar] [CrossRef]
  27. Deorukhkar, O.S.; Lokhande, S.H.; Nayak, V.R.; Chougule, A.A. Stock Price Prediction Using Combination of LSTM Neural Networks, ARIMA and Sentiment Analysis. Int. Res. J. Eng. Technol. 2008, 3497, 3497–3503. [Google Scholar]
  28. Pasupulety, U.; Abdullah Anees, A.; Anmol, S.; Mohan, B.R. Predicting Stock Prices Using Ensemble Learning and Sentiment Analysis. In Proceedings of the 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), Sardinia, Italy, 3–5 June 2019. [Google Scholar]
  29. Pimprikar, R.; Ramachandra, S.; Senthilkuma, K. Use of Machine Learning Algorithms and Twitter Sentiment Analysis for Stock Market Prediction. Int. J. Pure Appl. Math. 2017, 115, 521–526. [Google Scholar]
  30. Jadhav, R.; Wakode, M.S. Survey: Sentiment Analysis of Twitter Data for Stock Market Prediction. Ijarcce 2017, 6, 558–562. [Google Scholar] [CrossRef]
  31. Twintproject/Twint. Available online: https://github.com/twintproject/twint (accessed on 5 October 2021).
  32. Van Rossum, G. The Python Library Reference, Release 3.8.2; Python Software Foundation: Wilmington, DE, USA, 2020. [Google Scholar]
  33. Bird, S. NLTK: The Natural Language Toolkit. In Proceedings of the COLING/ACL on Interactive Presentation Sessions; Association for Computational Linguistics: Stroudsburg, PA, USA, 2006; pp. 69–72. [Google Scholar] [CrossRef]
  34. Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python; O’Reilly Media: Sebastopol, CA, USA, 2009. [Google Scholar]
  35. String—Common String Operations. Available online: https://docs.python.org/3/library/string.html (accessed on 5 October 2021).
  36. TextBlob: Simplified Text Processing. Available online: https://textblob.readthedocs.io/en/dev/ (accessed on 5 October 2021).
  37. Hutto, C.J.; Gilbert, E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. ICWSM 2014, 8, 216–225. [Google Scholar]
  38. Araci, D. FinBERT: Financial Sentiment Analysis with Pre-Trained Language Models. Available online: https://arxiv.org/abs/1908.10063 (accessed on 5 October 2021).
  39. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. NAACL HLT 2019-2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Conf. 2019, 1, 4171–4186. [Google Scholar]
  40. ProsusAI/finBERT. Available online: https://github.com/ProsusAI/finBERT (accessed on 5 October 2021).
  41. Malo, P.; Sinha, A.; Korhonen, P.J.; Wallenius, J.; Takala, P. Good Debt or Bad Debt: Detecting Semantic Orientations in Economic Texts. J. Assoc. Inf. Sci. Technol. 2014, 65, 782–796. [Google Scholar] [CrossRef] [Green Version]
  42. Drucker, H. Improving Regressors Using Boosting Techniques. In Proceedings of the Fourteenth International Conference on Machine Learning; Morgan Kaufmann: San Francisco, CA, USA, 1997; pp. 107–115. [Google Scholar]
  43. Wipf, D.; Nagarajan, S. A New View of Automatic Relevance Determination. In Advances in Neural Information Processing Systems; Platt, J., Koller, D., Singer, Y., Roweis, S., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2008; Volume 20. [Google Scholar]
  44. Graves, A.; Fernández, S.; Schmidhuber, J. Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition. In Proceedings of International Conference on Artificial Neural Networks; Springer: Berlin, Germany, 2005. [Google Scholar]
  45. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  46. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost : Unbiased Boosting with Categorical Features. arXiv 2019, arXiv:1706.09516v5. [Google Scholar]
  47. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Routledge: Oxfordshire, UK, 2017. [Google Scholar]
  48. Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
  49. Geurts, P.; Ernst, D.; Wehenkel, L. Extremely Randomized Trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef] [Green Version]
  50. Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H. Xgboost: Extreme Gradient Boosting. R Packag. Version 0.4-2 2015, 1, 1–4. [Google Scholar]
  51. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  52. Hampel, F.R.; Ronchetti, E.M.; Rousseeuw, P.J.; Stahel, W.A. Robust Statistics: The Approach Based on Influence Functions; John Wiley & Sons: Hoboken, NJ, USA, 2011; Volume 196. [Google Scholar]
  53. Devroye, L.; Gyorfi, L.; Krzyzak, A.; Lugosi, G. On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates. Ann. Stat. 2007, 22, 1371–1385. [Google Scholar] [CrossRef]
  54. Vovk, V. Kernel Ridge Regression. In Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik; Schölkopf, B., Luo, Z., Vovk, V., Eds.; Springer: Berlin, Germany, 2013; pp. 105–116. [Google Scholar]
  55. Efron, B.; Hastie, T.; Johnstone, I.; Tibshirani, R. Least Angle Regression. Ann. Stat. 2004, 32, 407–499. [Google Scholar] [CrossRef] [Green Version]
  56. Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  57. Fan, J.; Ma, X.; Wu, L.; Zhang, F.; Yu, X.; Zeng, W. Light Gradient Boosting Machine: An Efficient Soft Computing Model for Estimating Daily Reference Evapotranspiration with Local and External Meteorological Data. Agric. Water Manag. 2019, 225, 105758. [Google Scholar] [CrossRef]
  58. Seber, G.A.F.; Lee, A.J. Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2012; Volume 329. [Google Scholar]
  59. Murtagh, F. Multilayer Perceptrons for Classification and Regression. Neurocomputing 1991, 2, 183–197. [Google Scholar] [CrossRef]
  60. Rubinstein, R.; Zibulevsky, M.; Elad, M. Efficient Implementation of the KSVD Algorithm Using Batch Orthogonal Matching Pursuit; Computer Science Department, Technion: Haifa, Israel, 2008; pp. 1–15. [Google Scholar]
  61. Crammer, K.; Dekel, O.; Keshet, J.; Shalev-Shwartz, S.; Singer, Y. Online Passive-Aggressive Algorithms. J. Mach. Learn. Res. 2006, 7, 551–585. [Google Scholar]
  62. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  63. Choi, S.; Kim, T.; Yu, W. Performance Evaluation of RANSAC Family. In Proceedings of the British Machine Vision Conference, London, UK, 7–10 September 2009. [Google Scholar]
  64. Marquardt, D.W.; Snee, R.D. Ridge Regression in Practice. Am. Stat. 1975, 29, 3–20. [Google Scholar] [CrossRef]
  65. Smola, A.J.; Schölkopf, B. A Tutorial on Support Vector Regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef] [Green Version]
  66. Dang, X.; Peng, H.; Wang, X.; Zhang, H. The Theil-Sen Estimators in a Multiple Linear Regression Model. Manuscript. Available online: http://home.olemiss.edu/~xdang/pa%0Apers/ (accessed on 14 October 2021).
  67. An Open Source, Low-Code Machine Learning Library in Python. April 2020. Available online: https://www.pycaret.org (accessed on 12 October 2021).
  68. Keras. GitHub. 2015. Available online: https://github.com/fchollet/keras (accessed on 12 October 2021).
  69. Gulli, A.; Pal, S. Deep Learning with Keras; Packt Publishing: Birmingham, UK, 2017. [Google Scholar]
  70. Friedman, M. The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance. J. Am. Stat. Assoc. 1937, 32, 675–701. [Google Scholar] [CrossRef]
  71. Dunn, O.J. Multiple Comparisons Among Means. J. Am. Stat. Assoc. 1961, 56, 52. [Google Scholar] [CrossRef]
Figure 1. Final datasets’ construction process.
Figure 1. Final datasets’ construction process.
Entropy 23 01603 g001
Figure 2. Text-preprocessing scheme.
Figure 2. Text-preprocessing scheme.
Entropy 23 01603 g002
Figure 3. Sentiment setups’ CD-diagrams: single-day prediction.
Figure 3. Sentiment setups’ CD-diagrams: single-day prediction.
Entropy 23 01603 g003
Figure 4. Sentiment setups’ boxplots: single-day prediction.
Figure 4. Sentiment setups’ boxplots: single-day prediction.
Entropy 23 01603 g004
Figure 5. Sentiment setups’ CD-diagrams: one-week prediction.
Figure 5. Sentiment setups’ CD-diagrams: one-week prediction.
Entropy 23 01603 g005
Figure 6. Sentiment setups’ boxplots: one-week prediction.
Figure 6. Sentiment setups’ boxplots: one-week prediction.
Entropy 23 01603 g006
Figure 7. Sentiment setups’ CD-diagrams: two-week prediction.
Figure 7. Sentiment setups’ CD-diagrams: two-week prediction.
Entropy 23 01603 g007
Figure 8. Sentiment setups’ boxplots: two-week prediction.
Figure 8. Sentiment setups’ boxplots: two-week prediction.
Entropy 23 01603 g008
Figure 9. Algorithms’ CD-diagrams: single-day prediction.
Figure 9. Algorithms’ CD-diagrams: single-day prediction.
Entropy 23 01603 g009
Figure 10. Algorithms’ boxplots: single-day prediction.
Figure 10. Algorithms’ boxplots: single-day prediction.
Entropy 23 01603 g010
Figure 11. Algorithms’ boxplots: one-week prediction.
Figure 11. Algorithms’ boxplots: one-week prediction.
Entropy 23 01603 g011
Figure 12. Algorithms’ CD-diagrams: one-week prediction.
Figure 12. Algorithms’ CD-diagrams: one-week prediction.
Entropy 23 01603 g012
Figure 13. Algorithms’ CD-diagrams: two-week prediction.
Figure 13. Algorithms’ CD-diagrams: two-week prediction.
Entropy 23 01603 g013
Figure 14. Algorithms’ boxplots: two-week prediction.
Figure 14. Algorithms’ boxplots: two-week prediction.
Entropy 23 01603 g014
Table 1. Stock datasets.
Table 1. Stock datasets.
NoDatasetStocks
1AALAmerican Airlines Group
2AMDAdvanced Micro Devices
3AUYYamana Gold Inc.
4BABAAlibaba Group
5BACBank of America Corp.
6ETEnergy Transfer L.P.
7FCELFuelCell Energy Inc.
8GEGeneral Electric
9GMGeneral Motors
10INTCIntel Corporation
11MROMarathon Oil Corporation
12MSFTMicrosoft
13OXYOccidental Petroleum Corporation
14RYCEYRolls-Royce Holdings
15SQSquare
16VZVerizon Communications
Table 2. Algorithms.
Table 2. Algorithms.
No.AbbreviationAlgorithm
1ABRAdaBoost Regressor [42]
2ARDAutomatic Relevance Determination [43]
3BiLSTM (LSTM_2)Bidirectional LSTM [44]
4BiLSTM-LSTM (LSTM_3)Bidirectional LSTM and LSTM Stacked [44,45]
5CBRCatBoost Regressor [46]
6DTRDecision Tree Regressor [47]
7ELNElastic Net [48]
8ETExtra Trees Regressor [49]
9XGBoostExtreme Gradient Boosting [50]
10GBGradient Boosting Regressor [51]
11HBRHuber Regressor [52]
12KNRK-Neighbors Regressor [53]
13KERKernel Ridge [54]
14LSTMLSTM [45]
15LA-LASLasso Least Angle Regression [55]
16LASLasso Regression [56]
17LALeast Angle Regression [55]
18LGBMLight Gradient Boosting Machine [57]
19LNRLinear Regression [58]
20MLPMultilevel Perceptron [59]
21OMPOrthogonal Matching Pursuit [60]
22PARPassive Aggressive Regressor [61]
23RFRandom Forest Regressor [62]
24RSCRandom Sample Consensus [63]
25RDGRidge Regression [64]
26SVRSupport Vector Regression [65]
27THSTheil–Sen Regressor [66]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liapis, C.M.; Karanikola, A.; Kotsiantis, S. A Multi-Method Survey on the Use of Sentiment Analysis in Multivariate Financial Time Series Forecasting. Entropy 2021, 23, 1603. https://doi.org/10.3390/e23121603

AMA Style

Liapis CM, Karanikola A, Kotsiantis S. A Multi-Method Survey on the Use of Sentiment Analysis in Multivariate Financial Time Series Forecasting. Entropy. 2021; 23(12):1603. https://doi.org/10.3390/e23121603

Chicago/Turabian Style

Liapis, Charalampos M., Aikaterini Karanikola, and Sotiris Kotsiantis. 2021. "A Multi-Method Survey on the Use of Sentiment Analysis in Multivariate Financial Time Series Forecasting" Entropy 23, no. 12: 1603. https://doi.org/10.3390/e23121603

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop