Next Article in Journal
Methodology for Optimizing Factors Affecting Road Accidents in Poland
Previous Article in Journal
Day Ahead Electric Load Forecast: A Comprehensive LSTM-EMD Methodology and Several Diverse Case Studies
 
 
Article
Peer-Review Record

Time Series Dataset Survey for Forecasting with Deep Learning

Forecasting 2023, 5(1), 315-335; https://doi.org/10.3390/forecast5010017
by Yannik Hahn *, Tristan Langer, Richard Meyes and Tobias Meisen
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Forecasting 2023, 5(1), 315-335; https://doi.org/10.3390/forecast5010017
Submission received: 25 January 2023 / Revised: 20 February 2023 / Accepted: 28 February 2023 / Published: 3 March 2023
(This article belongs to the Special Issue Recurrent Neural Networks for Time Series Forecasting)

Round 1

Reviewer 1 Report

The paper provides a review of datasets used in scientific literature on applied time series forecasting. The Authors conducted an extensive review of scientific literature with a careful attention to public datasets’ structure, volumes and other characteristics. Although the study is extensive, the importance of the obtained results is not completely clear:

1. I’m sure that the Authors are aware about the M5 (and previous) forecasting competitions with thousands of different time series (https://doi.org/10.1016/j.ijforecast.2021.11.013). Seems that this series of competitions is closely related to the topic, but completely missed in the paper.

2. The obtained list of dataset is subject to bias of scientific publishing (e.g., there are a lot of public datasets that are frequently used for practical forecasting, but related studies are not published due to a lack of scientific novelty – e.g., a winning model is the ES in many cases). Thus, this is not clear what is the generalisation area for the obtained results.

3. It seems that the main research conclusion that there is “a research gap of a strongly needed general time series forecasting benchmark dataset” is not well-supported: given that the internal structure of multivariate time series varies a lot, the approach is thousands of different datasets seems more appropriate that one (?) benchmark dataset. 

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

Recent advances in Deep Learning have created much enthusiasm in different industrial fields, which has led to many publications pushing innovative techniques to make use of recent neural network architectures. However, the lack of a definitive baseline makes it hard to efficiently validate those techniques and keep track of the state of the art in machine learning problems. This paper attempts to show the limitations that plague the domain of time series forecasting, causing poor reproducibility and comparability. The authors also catalogued and classified frequently used datasets to propose a baseline for potential cross-domain benchmarks.

This is an interesting work using a proper methodology and carefully chosen papers for its survey. However, I am not convinced that the reasoning used for categorizing datasets is sound and the authors may be drawing conclusions too quickly. Therefore, I believe some issues should be addressed before acceptance of this manuscript:

1-     General comment: Authors should proofread their manuscript more carefully to avoid some minor issues in language and formatting. For example:

·        Introduction: Skipped reference 3

·        Line 96: afford --> effort

·        Lines 125-129: Repetition of the same phrase

·        Line 352: “Figure ??” should be “Figure 6”

·        Line 421: determent --> determined

 2-     Sections 2-5: The methodology part seems to be mixed with the results part. The logical flow of the manuscript should be revised to better separate methods and results.

3-     Section 4: It would be beneficial if the authors indicate more precisely which time series they used from each publication for the sake of reproducibility.

4-     Section 5: The workflow is confusing, and the descriptions do not always match the figures. Some clarifications are needed:

·        Line 345-350: Explain more thoroughly the difference between “datasets with the lowest distance to all other datasets” and “datasets to which most other datasets are the closest”.

·        Line 353: samples 25 and 26 are not shown or mislabelled.

·        Non-stationary time series are very prevalent in real life and failing to categorize them hurts your results.

5-     Section 6: What are the criterions used to set your baselines for PRV or AC? Were they simply chosen experimentally or are they arbitrarily defined

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 3 Report

This paper provides an overview of 44 publicly available time series datasets for the task of forecasting. The 44 datasets are selected by processing data retrieved from Web of Science and Paper with Code with some filtering mechanisms. Previously published survey papers are analyzed and compared. A similarity measure based on MP-distance is proposed and is used to compare the selected datasets. Besides, ADF, AC and PRV are calculated to represent the statistical characteristics of tread, seasonality and repeating values, respectively. Density-based scan clustering algorithm is used to categorize the selected datasets.  

This paper is well motivated, and the description of the problem is clear enough to be understandable. The methodology is described in detail and the selected datasets are analyzed comprehensively. Some experiments to calculate the similarity measure based on MP-distance, ADF, AC and PRV are implemented, and experiment results are discussed.

Major comments:

  1. Does Web of Science and Paper with Code cover ML conferences such as ICML, NeurIPS, AAAI and so on? If so, what is the time delay? It’s important that the proceedings of these ML conferences are considered as this paper focuses on time series datasets for forecasting with deep learning.
  2. Why are 5 and 10 used as the thresholds for citation to filter publications? Are 5 and 10 corresponding to statistical measures such as percentile?
  3. Most of the selected datasets are with strong pattern such as seasonality or periodicity, which renders the forecasting task relatively easier and thus less interesting compared to that with weak pattern. It would be interesting to include time series datasets without strong pattern in the survey.
  4. As there is no ground truth for the similarity between the selected datasets, it would be better to add more discussions on the results with MP-distance.  

Minor comments:

  1. Figure ?? in line 352.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Thank you for provided feedback and changes - I'm happy with improvements made

Back to TopTop