# Time Series Dataset Survey for Forecasting with Deep Learning

^{*}

## Abstract

**:**

## 1. Introduction

- 1.
- We provide a cross-domain overview of existing publicly available time series forecasting datasets that have been used in research.
- 2.
- Furthermore, we analyze these datasets regarding their domain and provide file and data structure, as well as general statistical characteristics, and compare them quantitatively with each other by computing their similarity.
- 3.
- We provide an overview of all public time series forecasting datasets identified in this publication and facilitate easy access with a list of links to all these datasets.
- 4.
- Finally, we facilitate comparability in the time series forecasting research area by calculating a grouping of datasets using the aforementioned similarity measures.

## 2. Related Survey Publications

## 3. Methodology

#### 3.1. Paper Screening

- 1.
- As a first step, we performed a screening of papers found in the “Web of Science” [4] to identify publicly available time series forecasting datasets which are used in research. To ensure the impact of the datasets in research, we limited our choice to papers that had been cited at least ten times. This reduced the number of papers from over 1000 to 207 and ensured that relevant papers with common datasets were not excluded. Another goal was to identify datasets already used for deep learning. We achieved this by adding the constraint of “deep learning” to the search query. As a result, the following Web of Science query was used:

- 2.
- To address the fact that newly published papers had not had the chance yo acquire ten citations to date, papers from the last year, 2021, had a restriction of having at least five citations and a maximum of ten citations as other papers were already included by step one. Therefore, we used the same Web of Science query as before and extracted 43 new publications on 17 December 2021.
- 3.
- To widen the search for datasets used in academic publications, we further utilized the website “papers with code” [29]. Then, “papers with code” was used to ensure the inclusion of recent publications from conferences that were not listed on “Web of Science”. The “papers with code” website has a collection of publicly available datasets with the associated papers that have published their code and results on a dataset. Furthermore, the website ranks publications per dataset by the number of stars of their corresponding Github repositories. We filtered the datasets by the categories of “time series” and “forecasting” to collect the datasets with their corresponding publications. We selected the top 10 ranked publications if a dataset had more than ten publications, resulting in 43 additional publications with eight datasets. Then, we used these publications for additional screening of datasets to find public datasets not identified by the website “papers with code”.

#### 3.2. Fundamentals of Statistical Time Series Characteristics

## 4. Time Series Domains

## 5. Screening of Public Datasets

- 1.
- Our first condition was that the data must be publicly accessible and not hidden behind a particular sign-in, or only available on request, to give an overview of general publicly available datasets.
- 2.
- The dataset must be directly downloadable as files to ensure reproducibility. Datasets that can only be accessed through a web view or dashboard, where multiple parameters need to be selected, were not included.
- 3.
- Datasets would not be considered if the data was only available in a specific country or the website was not in English, to ensure consistent access to the datasets.

## 6. Comparison of Selected Datasets

- 1.
- The forecast value must be clearly defined in a paper or a dataset description.
- 2.
- The defined forecasting value should not be aggregated over a period of time or locations.
- 3.
- For comparability, the target must be a univariate time series.

#### 6.1. Comparison of Selected Datasets with MPdist

#### 6.2. Comparison of Selected Datasets with Statistical Characteristics

## 7. Categorize the Datasets

- 1.
**stationary/high PRV/low to medium AC**: This category is a time series that is stationary and has many repeating values which are not distributed in regular patterns or distributed in some regular patterns. Similar datasets could be found in cluster one.- 2.
**stationary/high PRV/high AC**: This category is a time series that is stationary and has many repeating values which are distributed in regular patterns. Similar datasets could be found in cluster two.- 3.
**stationary/low PRV/low AC**: This category is a time series that is stationary and has many unique values which are distributed in irregular patterns. Similar datasets could be found in cluster three.- 4.
**stationary/low PRV/high AC**: This category is a time series that is stationary and has many unique values which are distributed in regular patterns. We only identified the dataset with ID 18 in the outlier cluster. This indicates that this category does not naturally appear in datasets used in research. This could be caused by the multiple domains which are combined in that dataset.- 5.
**non stationary**: Due to the small number of datasets we identified which were not stationary, this category could not be used for comparison. It is possible that there were multiple additional clusters that we did not identify. Nevertheless, the work done in this paper could be an indicator that there are not many stationary time series datasets for forecasting used in publications.

## 8. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Abbreviations

MDPI | Multidisciplinary Digital Publishing Institute |

DOAJ | Directory of open access journals |

TLA | Three letter acronym |

LD | Linear dichroism |

## References

- Längkvist, M.; Karlsson, L.; Loutfi, A. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognit. Lett.
**2014**, 42, 11–24. [Google Scholar] [CrossRef][Green Version] - Li, S.; Jin, X.; Xuan, Y.; Zhou, X.; Chen, W.; Wang, Y.X.; Yan, X. Enhancing the locality and breaking the memory bottleneck of transformer on time series forecasting. Adv. Neural Inf. Process. Syst.
**2019**, 32, 5243–5253. [Google Scholar] - Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Deep learning for time series classification: A review. Data Min. Knowl. Discov.
**2019**, 33, 917–963. [Google Scholar] [CrossRef][Green Version] - Web of Science. Available online: https://www.webofscience.com/wos/woscc/basic-search (accessed on 19 October 2021).
- Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Process. Mag.
**2012**, 29, 141–142. [Google Scholar] [CrossRef] - Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Chen, Y.; Keogh, E.; Hu, B.; Begum, N.; Bagnall, A.; Mueen, A.; Batista, G. The UCR Time Series Classification Archive. 2015. Available online: www.cs.ucr.edu/eamonn/timeseriesdata/ (accessed on 1 February 2023).
- Bagnall, A.; Lines, J.; Bostrom, A.; Large, J.; Keogh, E. The great time series classification bake off: A review and experimental evaluation of recent algorithmic advances. Data Min. Knowl. Discov.
**2017**, 31, 606–660. [Google Scholar] [CrossRef] [PubMed][Green Version] - Laptev, S.A.N.; Billawala, Y. S5-A Labeled Anomaly Detection Dataset, version 1.0 (16M). Available online: https://webscope.sandbox.yahoo.com/catalog.php?datatype=s&%20did=70&guccounter=1 (accessed on 1 February 2023).
- Ahmad, S.; Lavin, A.; Purdy, S.; Agha, Z. Unsupervised real-time anomaly detection for streaming data. Neurocomputing
**2017**, 262, 134–147. [Google Scholar] [CrossRef] - Wu, R.; Keogh, E. Current time series anomaly detection benchmarks are flawed and are creating the illusion of progress. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022. [Google Scholar]
- Ahmed, R.; Sreeram, V.; Mishra, Y.; Arif, M.D. A review and evaluation of the state-of-the-art in PV solar power forecasting: Techniques and optimization. Renew. Sustain. Energy Rev.
**2020**, 124, 109792. [Google Scholar] [CrossRef] - Aslam, S.; Herodotou, H.; Mohsin, S.M.; Javaid, N.; Ashraf, N.; Aslam, S. A survey on deep learning methods for power load and renewable energy forecasting in smart microgrids. Renew. Sustain. Energy Rev.
**2021**, 144, 110992. [Google Scholar] [CrossRef] - Chandra, R.; Goyal, S.; Gupta, R. Evaluation of Deep Learning Models for Multi-Step Ahead Time Series Prediction. IEEE Access
**2021**, 9, 83105–83123. [Google Scholar] [CrossRef] - Chen, C.H.; Kung, H.Y.; Hwang, F.J. Deep Learning Techniques for Agronomy Applications. Agronomy
**2019**, 9, 142. [Google Scholar] [CrossRef][Green Version] - Dikshit, A.; Pradhan, B.; Alamri, A.M. Pathways and challenges of the application of artificial intelligence to geohazards modelling. Gondwana Res.
**2021**, 100, 290–301. [Google Scholar] [CrossRef] - Ghalehkhondabi, I.; Ardjmand, E.; Young, W.A.; Weckman, G.R. Water demand forecasting: Review of soft computing methods. Environ. Monit. Assess.
**2017**, 189, 313. [Google Scholar] [CrossRef] [PubMed] - Lara-Benitez, P.; Carranza-Garcia, M.; Riquelme, J.C. An Experimental Review on Deep Learning Architectures for Time Series Forecasting. Int. J. Neural Syst.
**2021**, 31, 2130001. [Google Scholar] [CrossRef] - Liu, H.; Yan, G.; Duan, Z.; Chen, C. Intelligent modeling strategies for forecasting air quality time series: A review. Appl. Soft Comput. Soft Comput.
**2021**, 102, 106957. [Google Scholar] [CrossRef] - Mosavi, A.; Salimi, M.; Ardabili, S.F.; Rabczuk, T.; Shamshirband, S.; Varkonyi-Koczy, A.R. State of the Art of Machine Learning Models in Energy Systems, a Systematic Review. Energies
**2019**, 12, 1301. [Google Scholar] [CrossRef][Green Version] - Sengupta, S.; Basak, S.; Saikia, P.; Paul, S.; Tsalavoutis, V.; Atiah, F.; Ravi, V.; Peters, A. A review of deep learning with special emphasis on architectures, applications and recent trends. Knowl.-Based Syst.
**2020**, 194, 105596. [Google Scholar] [CrossRef][Green Version] - Somu, N.; Raman, G.M.R.; Ramamritham, K. A deep learning framework for building energy consumption forecast. Renew. Sustain. Energy Rev.
**2021**, 137, 110591. [Google Scholar] [CrossRef] - Sun, A.Y.; Scanlon, B.R. How can Big Data and machine learning benefit environment and water management: A survey of methods, applications, and future directions. Environ. Res. Lett.
**2019**, 14, 073001. [Google Scholar] [CrossRef] - Wang, H.; Liu, Y.; Zhou, B.; Li, C.; Cao, G.; Voropai, N.; Barakhtenko, E. Taxonomy research of artificial intelligence for deterministic solar power forecasting. Energy Convers. Manag.
**2020**, 214, 112909. [Google Scholar] [CrossRef] - Wei, N.; Li, C.; Peng, X.; Zeng, F.; Lu, X. Conventional models and artificial intelligence-based models for energy consumption forecasting: A review. J. Pet. Sci. Eng.
**2019**, 181, 106187. [Google Scholar] [CrossRef] - Weiss, M.; Jacob, F.; Duveiller, G. Remote sensing for agricultural applications: A meta-review. Remote Sens. Environ.
**2020**, 236, 111402. [Google Scholar] [CrossRef] - Zambrano, F.; Vrieling, A.; Nelson, A.; Meroni, M.; Tadesse, T. Prediction of drought-induced reduction of agricultural productivity in Chile from MODIS, rainfall estimates, and climate oscillation indices. Remote Sens. Environ.
**2018**, 219, 15–30. [Google Scholar] [CrossRef] - Sezer, O.B.; Gudelek, M.U.; Ozbayoglu, A.M. Financial time series forecasting with deep learning: A systematic literature review: 2005–2019. Appl. Soft Comput.
**2020**, 90, 106181. [Google Scholar] [CrossRef][Green Version] - Paper with Code. Available online: https://paperswithcode.com/ (accessed on 25 February 2022).
- Cheung, Y.W.; Lai, K.S. Lag order and critical values of the augmented Dickey–Fuller test. J. Bus. Econ. Stat.
**1995**, 13, 277–280. [Google Scholar] - Nason, G.P. Stationary and non-stationary time series. Stat. Volcanol.
**2006**, 60, 129–142. [Google Scholar] - Cheung, Y.W.; Lai, K.S. Power of the augmented dickey-fuller test with information-based lag selection. J. Stat. Comput. Simul.
**1998**, 60, 57–65. [Google Scholar] [CrossRef] - Mushtaq, R. Augmented dickey fuller test. Econom. Math. Methods Program. Ejournal
**2011**. [Google Scholar] [CrossRef] - Moineddin, R.; Upshur, R.E.; Crighton, E.; Mamdani, M. Autoregression as a means of assessing the strength of seasonality in a time series. Popul. Health Metrics
**2003**, 1, 10. [Google Scholar] [CrossRef][Green Version] - Percival, D.B. Three curious properties of the sample variance and autocovariance for stationary processes with unknown mean. Am. Stat.
**1993**, 47, 274–276. [Google Scholar] - Chen, Y.; Wang, Y.; Kirschen, D.; Zhang, B. Model-Free Renewable Scenario Generation Using Generative Adversarial Networks. IEEE Trans. Power Syst.
**2018**, 33, 3265–3275. [Google Scholar] [CrossRef][Green Version] - Du, S.; Li, T.; Yang, Y.; Horng, S.J. Multivariate time series forecasting via attention-based encoder-decoder framework. Neurocomputing
**2020**, 388, 269–279. [Google Scholar] [CrossRef] - Du, S.; Li, T.; Yang, Y.; Horng, S.J. Deep Air Quality Forecasting Using Hybrid Deep Learning Framework. IEEE Trans. Knowl. Data Eng.
**2021**, 33, 2412–2424. [Google Scholar] [CrossRef][Green Version] - Li, T.; Hua, M.; Wu, X. A Hybrid CNN-LSTM Model for Forecasting Particulate Matter (PM2.5). IEEE Access
**2020**, 8, 26933–26940. [Google Scholar] [CrossRef] - Huang, G.; Li, X.; Zhang, B.; Ren, J. PM2.5 concentration forecasting at surface monitoring sites using GRU neural network based on empirical mode decomposition. Sci. Total. Environ.
**2021**, 768, 144516. [Google Scholar] [CrossRef] [PubMed] - Kim, T.Y.; Cho, S.B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy
**2019**, 182, 72–81. [Google Scholar] [CrossRef] - Xu, W.; Peng, H.; Zeng, X.; Zhou, F.; Tian, X.; Peng, X. A hybrid modelling method for time series forecasting based on a linear regression model and deep learning. Appl. Intell.
**2019**, 49, 3002–3015. [Google Scholar] [CrossRef] - Jin, X.; Park, Y.; Maddix, D.; Wang, H.; Wang, Y. Domain adaptation for time series forecasting via attention sharing. In Proceedings of the International Conference on Machine Learning, Paris, France, 29–31 April 2022; PMLR: London, UK, 2022; pp. 10280–10297. [Google Scholar]
- Kim, K.; Kim, D.K.; Noh, J.; Kim, M. Stable Forecasting of Environmental Time Series via Long Short Term Memory Recurrent Neural Network. IEEE Access
**2018**, 6, 75216–75228. [Google Scholar] [CrossRef] - Wu, S.; Xiao, X.; Ding, Q.; Zhao, P.; Wei, Y.; Huang, J. Adversarial sparse transformer for time series forecasting. Adv. Neural Inf. Process. Syst.
**2020**, 33, 17105–17115. [Google Scholar] - Alexandrov, A.; Benidis, K.; Bohlke-Schneider, M.; Flunkert, V.; Gasthaus, J.; Januschowski, T.; Maddix, D.C.; Rangapuram, S.; Salinas, D.; Schulz, J.; et al. GluonTS: Probabilistic Time Series Models in Python. arXiv
**2019**, arXiv:1906.05264. [Google Scholar] - Feng, M.; Zheng, J.; Ren, J.; Hussain, A.; Li, X.; Xi, Y.; Liu, Q. Big Data Analytics and Mining for Effective Visualization and Trends Forecasting of Crime Data. IEEE Access
**2019**, 7, 106111–106123. [Google Scholar] [CrossRef] - Fang, K.; Shen, C.; Kifer, D.; Yang, X. Prolongation of SMAP to Spatiotemporally Seamless Coverage of Continental US Using a Deep Learning Neural Network. Geophys. Res. Lett.
**2017**, 44, 11030–11039. [Google Scholar] [CrossRef][Green Version] - Nigri, A.; Levantesi, S.; Marino, M.; Scognamiglio, S.; Perla, F. A Deep Learning Integrated Lee-Carter Model. Risks
**2019**, 7, 33. [Google Scholar] [CrossRef][Green Version] - Sagheer, A.; Kotb, M. Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems. Sci. Rep.
**2019**, 9, 19038. [Google Scholar] [CrossRef][Green Version] - Munir, M.; Siddiqui, S.A.; Dengel, A.; Ahmed, S. DeepAnT: A Deep Learning Approach for Unsupervised Anomaly Detection in Time Series. IEEE Access
**2019**, 7, 1991–2005. [Google Scholar] [CrossRef] - Raissi, M. Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations. J. Mach. Learn. Res.
**2018**, 19, 357. [Google Scholar] - Shih, S.Y.; Sun, F.K.; Lee, H.y. Temporal pattern attention for multivariate time series forecasting. Mach. Learn.
**2019**, 108, 1421–1441. [Google Scholar] [CrossRef][Green Version] - Liu, M.; Zeng, A.; Lai, Q.; Xu, Q. Time Series is a Special Sequence: Forecasting with Sample Convolution and Interaction. arXiv
**2021**, arXiv:2106.09305. [Google Scholar] - Madhusudhanan, K.; Burchert, J.; Duong-Trung, N.; Born, S.; Schmidt-Thieme, L. Yformer: U-Net Inspired Transformer Architecture for Far Horizon Time Series Forecasting. arXiv
**2021**, arXiv:2110.08255. [Google Scholar] - Shen, L.; Wang, Y. TCCT: Tightly-Coupled Convolutional Transformer on Time Series Forecasting. Neurocomputing
**2022**, 480, 131–145. [Google Scholar] [CrossRef] - Woo, G.; Liu, C.; Sahoo, D.; Kumar, A.; Hoi, S. Etsformer: Exponential smoothing transformers for time-series forecasting. arXiv
**2022**, arXiv:2202.01381. [Google Scholar] - Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst.
**2021**, 34, 22419–22430. [Google Scholar] - Yue, Z.; Wang, Y.; Duan, J.; Yang, T.; Huang, C.; Tong, Y.; Xu, B. TS2Vec: Towards Universal Representation of Time Series. arXiv
**2021**, arXiv:2106.10466. [Google Scholar] [CrossRef] - Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
- Deng, J.; Chen, X.; Jiang, R.; Song, X.; Tsang, I.W. A Multi-view Multi-task Learning Framework for Multi-variate Time Series Forecasting. arXiv
**2021**, arXiv:2109.01657. [Google Scholar] [CrossRef] - Du, W.; Côté, D.; Liu, Y. Saits: Self-attention-based imputation for time series. Expert Syst. Appl.
**2023**, 219, 119619. [Google Scholar] [CrossRef] - Lai, G.; Chang, W.C.; Yang, Y.; Liu, H. Modeling long-and short-term temporal patterns with deep neural networks. In Proceedings of the The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 95–104. [Google Scholar]
- Minhao, L.; Zeng, A.; Chen, M.; Xu, Z.; Qiuxia, L.; Ma, L.; Xu, Q. SCINet: Time Series Modeling and Forecasting with Sample Convolution and Interaction. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Chang, X.; Zhang, C. Connecting the dots: Multivariate time series forecasting with graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 23–27 August 2020; pp. 753–763. [Google Scholar]
- Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, Paris, France, 29–31 April 2022; PMLR: London, UK, 2022; pp. 27268–27286. [Google Scholar]
- Liu, Y.; Wu, H.; Wang, J.; Long, M. Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November 2022. [Google Scholar]
- Wojtkiewicz, J.; Hosseini, M.; Gottumukkala, R.; Chambers, T.L. Hour-Ahead Solar Irradiance Forecasting Using Multivariate Gated Recurrent Units. Energies
**2019**, 12, 4055. [Google Scholar] [CrossRef][Green Version] - Zuo, G.; Luo, J.; Wang, N.; Lian, Y.; He, X. Decomposition ensemble model based on variational mode decomposition and long short-term memory for streamflow forecasting. J. Hydrol.
**2020**, 585, 124776. [Google Scholar] [CrossRef] - Samal, K.K.R.; Babu, K.S.; Das, S.K. Multi-directional temporal convolutional artificial neural network for PM2.5 forecasting with missing values: A deep learning approach. Urban Clim.
**2021**, 36, 100800. [Google Scholar] [CrossRef] - Zhang, Z.; Zeng, Y.; Yan, K. A hybrid deep learning technology for PM2.5 air quality forecasting. Environ. Sci. Pollut. Res.
**2021**, 28, 39409–39422. [Google Scholar] [CrossRef] - Harutyunyan, H.; Khachatrian, H.; Kale, D.C.; Ver Steeg, G.; Galstyan, A. Multitask learning and benchmarking with clinical time series data. Sci. Data
**2019**, 6, 96. [Google Scholar] [CrossRef][Green Version] - Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent Neural Networks for Time Series Forecasting: Current status and future directions. Int. J. Forecast.
**2021**, 37, 388–427. [Google Scholar] [CrossRef] - Kang, Y.; Hyndman, R.J.; Li, F. GRATIS: GeneRAting TIme Series with diverse and controllable characteristics. Stat. Anal. Data Mining Asa Data Sci. J.
**2020**, 13, 354–376. [Google Scholar] [CrossRef] - Ng, E.; Wang, Z.; Chen, H.; Yang, S.; Smyl, S. Orbit: Probabilistic Forecast with Exponential Smoothing. arXiv
**2021**, arXiv:2004.08492v4. [Google Scholar] - Oreshkin, B.N.; Carpov, D.; Chapados, N.; Bengio, Y. N-BEATS: Neural Basis Expansion Analysis for Interpretable Time Series Forecasting. arXiv
**2020**, arXiv:1905.10437v4. [Google Scholar] - Bhatnagar, A.; Kassianik, P.; Liu, C.; Lan, T.; Yang, W.; Cassius, R.; Sahoo, D.; Arpit, D.; Subramanian, S.; Woo, G.; et al. Merlion: A Machine Learning Library for Time Series. arXiv
**2020**, arXiv:2109.09265v1. [Google Scholar] - Redd, A.; Khin, K.; Marini, A. Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm. arXiv
**2019**, arXiv:1907.03329v1. [Google Scholar] - Klimek, J.; Klimek, J.; Kraskiewicz, W.; Topolewski, M. Long-Term Series Forecasting with Query Selector—Efficient Model of Sparse Attention. arXiv
**2021**, arXiv:2107.08687v2. [Google Scholar] - Deshpande, P.; Sarawagi, S. Long Range Probabilistic Forecasting in Time-Series using High Order Statistics. arXiv
**2021**, arXiv:2111.03394. [Google Scholar] - Yang, L.; Hong, S.; Zhang, L. Iterative Bilinear Temporal-Spectral Fusion for Unsupervised Representation Learning in Time Series. Available online: https://openreview.net/forum?id=MjbdO3_ihp (accessed on 25 February 2022).
- Koochali, A.; Schichtel, P.; Dengel, A.; Ahmed, S. Probabilistic forecasting of sensory data with generative adversarial networks–forgan. IEEE Access
**2019**, 7, 63868–63880. [Google Scholar] [CrossRef] - Bondarenko, I. More layers! End-to-end regression and uncertainty on tabular data with deep learning. arXiv
**2021**, arXiv:2112.03566. [Google Scholar] - Malinin, A.; Band, N.; Chesnokov, G.; Gal, Y.; Gales, M.J.F.; Noskov, A.; Ploskonosov, A.; Prokhorenkova, L.; Provilkov, I.; Raina, V.; et al. Shifts: A dataset of real distributional shift across multiple large-scale tasks. arXiv
**2021**, arXiv:2107.07455. [Google Scholar] - Choudhry, A.; Moon, B.; Patrikar, J.; Samaras, C.; Scherer, S. CVaR-based Flight Energy Risk Assessment for Multirotor UAVs using a Deep Energy Model. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 262–268. [Google Scholar]
- Rodrigues, T.A.; Patrikar, J.; Choudhry, A.; Feldgoise, J.; Arcot, V.; Gahlaut, A.; Lau, S.; Moon, B.; Wagner, B.; Matthews, H.S.; et al. In-flight positional and energy use data set of a DJI Matrice 100 quadcopter for small package delivery. Sci. Data
**2021**, 8, 155. [Google Scholar] [CrossRef] [PubMed] - Patrikar, J.; Moon, B.; Oh, J.; Scherer, S. Predicting Like A Pilot: Dataset and Method to Predict Socially-Aware Aircraft Trajectories in Non-Towered Terminal Airspace. arXiv
**2021**, arXiv:2109.15158. [Google Scholar] - Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. The M4 Competition: Results, findings, conclusion and way forward. Int. J. Forecast.
**2018**, 34, 802–808. [Google Scholar] [CrossRef] - Spiliotis, E.; Assimakopoulos, V.; Makridakis, S.; Assimakopoulos, V. The M5 Accuracy competition: Results, findings and conclusions. Int. J. Forecast.
**2022**, 38, 1346–1364. [Google Scholar] - Khodayar, M.; Wang, J. Spatio-Temporal Graph Deep Neural Network for Short-Term Wind Speed Forecasting. IEEE Trans. Sustain. Energy
**2019**, 10, 670–681. [Google Scholar] [CrossRef] - Peng, Z.; Peng, S.; Fu, L.; Lu, B.; Tang, J.; Wang, K.; Li, W. A novel deep learning ensemble model with data denoising for short-term wind speed forecasting. Energy Convers. Manag.
**2020**, 207, 112524. [Google Scholar] [CrossRef] - Gharghabi, S.; Imani, S.; Bagnall, A.; Darvishzadeh, A.; Keogh, E. Matrix profile xii: Mpdist: A novel time series distance measure to allow data mining in more challenging scenarios. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 965–970. [Google Scholar]
- Madrid, F.; Imani, S.; Mercer, R.; Zimmerman, Z.; Shakibay, N.; Keogh, E. Matrix profile xx: Finding and visualizing time series motifs of all lengths using the matrix profile. In Proceedings of the 2019 IEEE International Conference on Big Knowledge (ICBK), Beijing, China, 8–11 November 2019; pp. 175–182. [Google Scholar]
- TS-Fresh. Available online: https://tsfresh.readthedocs.io/en/latest/ (accessed on 25 February 2022).
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the kdd, Portland, Oregon, USA, 2–4 August 1996; Volume 96, pp. 226–231. [Google Scholar]

**Figure 2.**Visualization of the public dataset search process with the number of publications used for the review.

**Figure 3.**Domain distribution of the datasets which were identified while screening the “Web of Science” publication corpus.

**Figure 5.**Samples from all dataset from Table 6 with a maximum length of 100 k.

**Table 1.**This table compares how the extracted surveys identified in this paper analyzed time series datasets. The tick indicates if the corresponding publication met the condition of the column.

Paper | Datasets with References | Easy access to the datasets | Multiple Datasets | Cross-domain | Comparison of Datasets | Dataset Statistics | Dataset Analysis |
---|---|---|---|---|---|---|---|

Ahmed et al. [12] | |||||||

Aslam et al. [13] | ✓ | ✓ | ✓ | ✓ | |||

Chandra et al. [14] | ✓ | ✓ | ✓ | ||||

Chen et al. [15] | |||||||

Dikshit et al. [16] | |||||||

Ghalehkhondabi et al. [17] | |||||||

Lara-Benitez et al. [18] | ✓ | ✓ | ✓ | ✓ | ✓ | ||

Liu et al. [19] | |||||||

Mosavi et al. [20] | ✓ | ||||||

Sengupta et al. [21] | ✓ | ✓ | ✓ | ✓ | |||

Somu et al. [22] | |||||||

Sun and Scanlon [23] | |||||||

Wang et al. [24] | |||||||

Wei et al. [25] | ✓ | ||||||

Weiss et al. [26] | ✓ | ||||||

Zambrano et al. [27] | ✓ | ✓ |

Astro | Machine Sensor | Mortality rate | Fertility rate |

Physics Simulation | Supply Chain | Tourism | NLP |

Crime | Garbage/Waste Prediction | Gas Consumption | Mobile Network |

Network Security | Trend Forecasting | Yield Prediction | Machine Sensor |

Chemicals | Cloud Load | AD Exchange | Bike-sharing |

Non-linear Problems | Web Traffic |

**Table 3.**This Table presents all links to the shown datasets from Table 4. The web links can be used to retrieve the before-shown datasets.

**Table 4.**A general overview of the public datasets found through the paper screening of “Web of Science”and the “papers with code” as defined in Section 3.1. The coding of the column “Data Structure” column is defined in Table 5 with the underlying structure (FileDatastructure/DatasetDescription/Timestamp).

ID | Domain | Data Structure | File Format | # Data Points | # Dimensions | Time Interval | Paper |
---|---|---|---|---|---|---|---|

0 | Windspeed | (−/−/−) | csv | 105,119 | 51 | 5 min | [36] |

1 | Electricity | (−/−/−) | csv | 105,119 | 31 | 5 min | [36] |

2 | Air Quality | (+/+/+) | csv | 43,824 | 12 | 1 h | [37,38,39,40] |

3 | Electricity | (+/+/+) | csv | 2,075,259 | 8 | 1 min | [37,41,42,43] |

4 | Air Quality | (+/+/+) | xlsx | 9471 | 16 | 1 h | [37,44] |

5 | Air Quality | (+/+/+) | csv | 2,891,393 | 7 | 1 h | [38] |

6 | Traffic | (+/o/−) | txt | 3,997,413 | 11 | 1 h | [2,37,45,46] |

7 | Crime | (+/+/+) | csv | 2,678,959 | 15 | irregular | [47] |

8 | Weather | (+/+/+) | txt | 2764 | 24 | 15 min | [48] |

9 | Ozone Level | (+/o/+) | csv | 2536 | 74 | 1 h | [44] |

10 | Fertility | (+/+/+) | rda | 574 | 4 | 1 yr | [49] |

11 | Mortality | (+/+/+) | csv | 21,201 | 8 | 1 yr | [49] |

12 | Weather, Bike-Sharing | (+/+/+) | csv | 731 | 15 | 1 d | [50] |

13 | Weather, Bike-Sharing | (+/+/+) | csv | 17,379 | 16 | 1 h | [50] |

14 | Electricity, Weather | (+/+/+) | xlsx | 713 | 3 | 1 d | [48] |

15 | Weather | (+/+/+) | xlsx | 15,072 | 12 | 1 h | [48] |

16 | Machine Sensor | (−/o/−) | txt | - | - | 100 ms | [51] |

17 | AD Exchange Rate | (+/o/+) | csv | 9610 | 3 | 1 h | [51] |

18 | Multiple | (+/o/+) | csv | 69,561 | 3 | 5 min | [51] |

19 | Traffic | (+/o/+) | csv | 15,664 | 3 | 5 min | [51] |

20 | Cloud Load | (+/o/+) | csv | 67,740 | 3 | 5 min | [51] |

21 | Tweet Count | (+/o/+) | csv | 158,631 | 3 | 5 min | [51] |

22 | Synthetic | (+/+/−) | mat | - | - | [52] | |

23 | Electricity | (+/−/−) | txt | 140,256 | 370 | 15 min | [45,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,45] |

24 | Exchange Rate | (+/−/−) | txt | 7587 | 7 | 1 d | [53,54,57,58,64,65,66,67] |

25 | Traffic | (+/−/−) | txt | 17,543 | 861 | 1 h | [53,57,58,64,65,66,67] |

26 | Solar | (+/−/−) | txt | 52,559 | 136 | 10 min | [53,63,64,65] |

27 | Weather | (+/+/+) | csv | - | - | 1 min | [68] |

28 | Water Level | (+/+/+) | xlsx | 36,160 | 4 | 1 d | [69] |

29 | Air Quality | (+/+/+) | csv | 420,768 | 19 | 15 min | [62,70] |

30 | Air Quality | (+/+/+) | csv | 79,559 | 11 | 15 min | [50,71] |

31 | Crime | (+/+/+) | csv | 2,129,525 | 34 | 1 min | [47] |

32 | Chemicals | (+/+/+) | xlsx | 120,630 | 7 | 1 min | [72] |

33 | Multiple | (+/−/−) | txt | 71 | 110 | 1 M. | [18] |

34 | Multiple | (+/+/+) | txt | 167,562 | 3 | 1 yr, 1 q, 1 m | [18,73,74,75,76] |

35 | Traffic | (+/+/−) | xls | - | - | 1 d | [18,73] |

36 | Tourism | (+/−/−) | csv | 309 | 794 | 1 m 1 q | [18] |

37 | Web Traffic | (+/+/+) | csv | 290,126 | 804 | 1 d | [18,73] |

38 | Multiple | (+/o/+) | csv | 414 | 960 | 1 yr, 1 q, 1 m, 1 w, 1 d, 1 h | [2,18,45,46,73,74,75,76,77,78] |

39 | Machine Sensor | (+/+/+) | csv | 34,840 | 9 | 1 h, 1 m | [54,55,56,57,58,59,60,79,80,81] |

40 | Synthetic | (−/−/−) | pickle | - | - | [82] | |

41 | Electricity | (+/+/+) | csv | 4,055,880 | 6 | 5 min, 1 h | [2,45,53,54,63,77,53] |

42 | Weather | (+/+/+) | csv | 633,494,597 | 125 | 1 yr | [83,84] |

43 | Electricity | (+/−/+) | csv | 257,896 | 27 | 1 h | [85,86] |

44 | Trajectory | (+/+/+) | txt | 8,241,680 | 14 | 1 s | [87] |

45 | Wind | (+/+/−) | csv | 262,968 | 254 | hourly | [2,45] |

46 | Bike-Usage | (+/+/+) | csv | 52,584 | 5 | hourly | [77] |

47 | Electricity | (+/+/+) | csv | 48,048 | 16 | hourly | [80] |

48 | Illness | (+/+/+) | csv | 966 | 7 | weekly | [57,58,66,67] |

49 | Sales | (+/+/+) | csv | 1,058,297 | 9 | daily | [43] |

50 | Weather | (+/+/+) | csv | 52,696 | 21 | 10 min | [57,58,66,67,81] |

51 | Traffic | (+/−/−) | mat | 57,636 | 48 | hourly | [43] |

52 | Weather | (+/+/+) | csv | 35,064 | 12 | hourly | [60,64] |

**Table 5.**Description of the column “Data Structure” with (FileDatastructure/DatasetDescription/Timestamp) from Table 4.

File Data Structure | Dataset Description | Timestamp | |
---|---|---|---|

+ | One file or multiple files with a clear structure and documentation. | The dataset contains a description of every field, which could lead to an understanding of all fields. | It is a timestamp, date or any date-related column defined. |

o | The dataset contains different placeholders in the data, which are explained later in a description. | ||

− | Multiple files in different directories without any obvious order and relation between each other. | There is no field description or an incomplete one. | There is no timestamp, date or any date-related column defined. |

ID | Time Interval | Domain | # Data Points | # Dimensions | Forecasting Value | ADF | AC | PRV |
---|---|---|---|---|---|---|---|---|

2 | 1 h | Air Quality | 43,824 | 12 | pm2.5 | 0.0000 | 0.4332 | 0.8795 |

3 | 1 min | Electricity | 2,075,259 | 8 | global_active_power | 0.0000 | 0.7028 | 0.9088 |

4 | 1 h | Air Quality | 9471 | 16 | pt08.s1(co) | 0.0000 | 0.4333 | 0.8714 |

5 | 1 h | Air Quality | 2,891,393 | 7 | pm25_concentration | 0.0000 | 0.3299 | 0.9154 |

10 | 1 yr | Fertility | 574 | 4 | fert-female | 0.2438 | 0.3113 | 0.2000 |

11 | 1 yr | Mortality | 21,201 | 8 | mort-female | 0.0179 | 0.0851 | 0.0896 |

12 | 1 d | Weather, Bike-Sharing | 731 | 15 | cnt | 0.3427 | 0.6827 | 0.0503 |

13 | 1 h | Weather, Bike-Sharing | 17,379 | 16 | cnt | 0.0000 | 0.0963 | 0.8872 |

17 | 1 h | AD Exchange Rate | 9610 | 3 | value | 0.0032 | 0.0085 | 0.0000 |

18 | 5 min | Multiple | 69,561 | 3 | value | 0.0000 | 0.9195 | 0.0000 |

19 | 5 min | Traffic | 15,664 | 3 | value | 0.0000 | 0.0852 | 0.5500 |

20 | 5 min | Cloud Load | 67,740 | 3 | value | 0.0000 | 0.0496 | 0.0584 |

21 | 5 min | Tweet Count | 158,631 | 3 | value | 0.0000 | 0.1992 | 0.7619 |

28 | 1 d | Water Level | 17,543 | 861 | dailyrunoff | 0.0000 | 0.1435 | 0.8597 |

29 | 15 min | Air Quality | 52,559 | 136 | pm2.5 | 0.0000 | 0.8577 | 0.7647 |

30 | 15 min | Air Quality | 79,559 | 11 | value | 0.0000 | 0.5948 | 0.8852 |

39 | 1 h | Machine Sensor | 34,840 | 9 | ot | 0.0052 | 0.5594 | 0.8544 |

41 | 5 min, 1 h | Electricity | 4,634,040 | 6 | power(mw) | 0.0000 | 0.6134 | 0.9773 |

49 | 1 d | Sales | 1,058,297 | 9 | sales | 0.0000 | 0.2742 | 0.8488 |

52 | 1 h | Weather | 35,064 | 12 | wetbulbcelsius | 0.0000 | 0.7422 | 0.9557 |

**Table 7.**Overview of the identified clusters and the corresponding statistical features of the dataset.

ID | Domain | Time Interval | # Data Points | # Dimensions | ADF | AC | PRV |
---|---|---|---|---|---|---|---|

Outliers | |||||||

10 | Fertility | 1 yr | 574 | 4 | 0.2438 | 0.3113 | 0.2000 |

12 | Weather, Bike-Sharing | 1 d | 731 | 15 | 0.3427 | 0.6827 | 0.0503 |

18 | Multiple | 5 min | 69,561 | 3 | 0.0000 | 0.9195 | 0.0000 |

19 | Traffic | 5 min | 15,664 | 3 | 0.0000 | 0.0852 | 0.5500 |

29 | Air Quality | 15 min | 420,768 | 19 | 0.0000 | 0.8577 | 0.7647 |

Cluster 1 | |||||||

2 | Air Quality | 1 h | 43,824 | 12 | 0.0000 | 0.4332 | 0.8795 |

4 | Air Quality | 1 h | 9471 | 16 | 0.0000 | 0.4333 | 0.8714 |

5 | Air Quality | 1 h | 2,891,393 | 7 | 0.0000 | 0.3299 | 0.9154 |

13 | Weather, Bike-Sharing | 1 h | 17,379 | 16 | 0.0000 | 0.0963 | 0.8872 |

21 | Tweet Count | 5 min | 158,631 | 3 | 0.0000 | 0.1992 | 0.7619 |

28 | Water Level | 1 d | 36,160 | 4 | 0.0000 | 0.1435 | 0.8597 |

39 | Machine Sensor | 1 h, 1 m | 34,840 | 9 | 0.0052 | 0.5594 | 0.8544 |

49 | Sales | 1 d | 1058297 | 9 | 0.0000 | 0.2742 | 0.8488 |

Cluster 2 | |||||||

3 | Electricity | 1 min | 2,075,259 | 8 | 0.0000 | 0.7028 | 0.9088 |

30 | Air Quality | 15 min | 79,559 | 11 | 0.0000 | 0.5948 | 0.8852 |

41 | Electricity | 5 min, 1 h | 4,826,760 | 6 | 0.0000 | 0.6134 | 0.9773 |

52 | Weather | 1 h | 35064 | 12 | 0.0000 | 0.7422 | 0.9557 |

Cluster 3 | |||||||

11 | Mortality | 1 yr | 21,201 | 8 | 0.0179 | 0.0851 | 0.0896 |

17 | AD Exchange Rate | 1 h | 9610 | 3 | 0.0032 | 0.0085 | 0.0000 |

20 | Cloud Load | 5 min | 67,740 | 3 | 0.0000 | 0.0496 | 0.0584 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Hahn, Y.; Langer, T.; Meyes, R.; Meisen, T.
Time Series Dataset Survey for Forecasting with Deep Learning. *Forecasting* **2023**, *5*, 315-335.
https://doi.org/10.3390/forecast5010017

**AMA Style**

Hahn Y, Langer T, Meyes R, Meisen T.
Time Series Dataset Survey for Forecasting with Deep Learning. *Forecasting*. 2023; 5(1):315-335.
https://doi.org/10.3390/forecast5010017

**Chicago/Turabian Style**

Hahn, Yannik, Tristan Langer, Richard Meyes, and Tobias Meisen.
2023. "Time Series Dataset Survey for Forecasting with Deep Learning" *Forecasting* 5, no. 1: 315-335.
https://doi.org/10.3390/forecast5010017