Next Article in Journal
Nowcasting GDP: An Application to Portugal
Previous Article in Journal
Examining Factors That Affect Movie Gross Using Gaussian Copula Marginal Regression
 
 
Article
Peer-Review Record

Can Groups Improve Expert Economic and Financial Forecasts?

Forecasting 2022, 4(3), 699-716; https://doi.org/10.3390/forecast4030038
by Warwick Smith 1, Anca M. Hanea 2 and Mark A. Burgman 3,*
Reviewer 1: Anonymous
Reviewer 2:
Forecasting 2022, 4(3), 699-716; https://doi.org/10.3390/forecast4030038
Submission received: 27 May 2022 / Revised: 22 July 2022 / Accepted: 26 July 2022 / Published: 2 August 2022

Round 1

Reviewer 1 Report

I enjoyed reading the paper: It is well structured and the discussion easy to follow. The methods and results are clearly explained.

I did not find any serious issue or mistake. I only have two suggestions for improvement that would make the analysis more interesting and the conclusions more in line with the presented results.

First, I suggest the authors to investigate further the economic relevance of the found differences among the predictions, in particular given that generally no statistical outperformance has been documented. For example, the authors can indicate the economic size of the reduction in the prediction errors that can be obtained and their relevance in specific subperiods and/or variables, highlighting the possible/realized consequences in economic terms.

This will allow probably to overcome my second point: I found a general overstatement of the results. From what has been presented my conclusion is that forming groups of experts is generally not providing a significant outperformance. I suggest the authors to change their conclusions accordingly.

Author Response

We thank the reviewer for the insightful comments. We have addressed the reviewer's comments and augmented the paper (with new text marked in red) .

We agree that the economic relevance of the differences among the predictions would be an excellent addition to the paper. Unfortunately, we did not find such significant differences. We argue that the differences we found are an indication of the possibility of finding larger (and more significant) differences when structured expert judgement protocols are used.


We have augmented the discussion in Section 4 to say (paraphrasing here) that the surveys compared very favourably to Treasury given they were not conducted for the purpose of producing group forecasts. This implies that if we had used structured expert judgement protocols the group forecasts would have outperformed Treasury. We clarified that we are concluding from the data, that these techniques have potential to improve forecasting, not that our analysis improved the forecasting. 

Reviewer 2 Report

The paper concerns an important problem: the comparison of group and individual forecasts. The paper is well written and explained, but should be improved in methdological terms, also linking up the work to papers that are not present among the references. Specific points, which require a response from the authors, are the following:

1. to compare forecasts, the authors used MAPE; ALRE; SMdAPE, but do not clearly explain why they use these measures and what is the rationale behind this choice. An explanation should be provided also linking it with review papers on forecasting such as Gneiting et al. (2011): Making and evaluation point forecasts, JASA

 

2. The authors compare forecasts using metrics that are conceived for continuous response variables. The authors should compare group and individual predictions of other types of variables, for instance, binary (where the AUROC is used) and ordinal (where Spearman's rank correlation is used)See again Gneiting (2011, JASA)

3. In the results section the authors consider different types of weighting schemes for experts. They should also consider weighting experts according to the number of questions they answer.

 

3. The work of the paper is connected with the idea of "model ensemble" known in the machine learning literature (see e.g. Friedman, Hastie, Tibshirani: "the elements of statistical learning", Springer, 2009); and the concept of "Bayesian model averaging"  (see e.g. Giudici, Mezzetti, Muliere: "Mixtures of products of Dirichlet process for variable selection in survival analysis", Journal of Statistical Planning and Inference, 2003.) The authors should refer to these concepts at least from a referencing point of view. These extensions would also allow to better consider the case of interval or quantile predictions (see e.g. Bracher et al., 2021, "evaluating epidemic forecasts in an interval format", Plos One.

4. the authors should consider how the comparative accuracy of their predictions evolves in time: comparing one-step ahead predictions with multi step ones.

 

If the authors can satisfactorily address these questions, it can be reconsidered

     

 

Author Response

We thank the reviewer for the insightful comments. We have addressed the comments and augmented the paper accordingly (with new text marked in red).

  1. We have augmented the discussion from the beginning of Section 2.2 to clarify the choice of the measures we used in the analysis. We have also linked our discussion to the discussion in Gneiting et al. (2011).  In addition, we have added a paragraph at the end of section of 2.1, which highlights where some of the limitations of our analysis come from.
  2. We have now clarified in various places in the paper that the only elicited answers were for continuous response variables. We have also augmented the discussion in Section 2.3 to clarify why other measures are not appropriate.
  3. Thank you for this comment. We added a couple of paragraphs at the end of Section 2.3 where we discuss the type of weighting used in aggregating expert elicited data in general, and in particular in our study. In a way we do consider the number of questions answered by each expert implicitly, through the way we calculate the weights.
  4. Thank you for this suggestion. We now discuss the similarities between model averaging and the aggregation of expert elicited data just before Section 3. We have also referred to the Friedman, Hastie, Tibshirani book.
  5. We agree that comparing one-step to multi step predictions would enhance the analysis. Unfortunately, the dataset size does not allow for such analysis. A simplified version of this is treated in the paper when comparing the yearly performance with the cumulative performance and allowing gaps in estimation to still contribute to the performance measured further in time.

Round 2

Reviewer 2 Report

The authors positively replied to my remarks. The only point which still lacks concerns the analogies between the aggregation of expert predictions and statistical learning methods such as ensemble models and bayesian model averaging. While the authors have added a comment and a reference on the former, they should also add a short comment and references on bayesian model averaging. A bayesian model learning approach allows to explicitly take into account model uncertainty, embedding it into the predictions, and allows to assign weights to the different experts or features depending on their statistical importance, as illustrated for example in the previously cited paper: Giudici, P.,  Muliere, P., Mezzetti, M. (2004). Mixtures of Dirichlet process priors for variable selection in survival analysis. Journal of statistical planning and inference, vol. 111, n. 1-2, pp. 101-115

 

The authors should address the above comment in their reply

 

Author Response

We have now complemented the text about statistical learning methods with a few words on Bayesian model averaging and the suggested reference. 

 

Back to TopTop