# Searching for Promisingly Trained Artificial Neural Networks

^{*}

## Abstract

**:**

## 1. Introduction

- Image-based methods utilize ground-based systems such as sky cameras, Doppler weather radar systems, LiDAR optical systems, or satellite imagery to predict the cloud cover essential for solar irradiance and power forecasting. Within this category, scientists often incorporate physical models or numerical weather predictions.
- Statistical models focus on analyzing the intrinsic characteristics of the target time series. Commonly employed methods in this category include the auto-regressive method and its vectorial variant, the Markov chain approach, and the analog ensemble.
- Machine learning-based techniques encompass methods that employ ANNs, like extreme learning machines or convolutional neural networks, to capture the non-linear characteristics of the variable under prediction. Other techniques in this category include support vector machines, decision trees, and Gaussian processes.
- Decomposition-based methods cover variational, empirical, and wavelet decompositions, as well as approaches reliant on the Fourier transform.

- How can scientists accurately estimate local minima during network training?
- What quantitative advantages are gained by identifying the local minima of the objective function during network training?
- How do existing training methods in the literature leverage detailed knowledge of the objective function’s local minima?

## 2. Materials and Methods

- The number of experiments per batch ($I$)
- The total number of batches ($J$)
- The number of discretization bins ($L$)
- The vector to retain the $J-1$ similarity index values ($\mathit{s}$)
- The tolerance threshold, $\alpha $.

## 3. Results

^{®}on a PC equipped with an Intel Core

^{®}i7 CPU, 16 GB RAM, and a 64-bit architecture.

#### 3.1. Short-Term Wind Power Forecasting

#### 3.2. Short-Term Load Forecasting

#### 3.3. Short-Term Wind Speed Forecasting

#### 3.4. Relationship with Bayesian Model Selection

## 4. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Tawn, R.; Browell, J. A review of very short-term wind and solar power forecasting. Renew. Sustain. Energy Rev.
**2022**, 153, 111758. [Google Scholar] [CrossRef] - Wang, Y.; Zou, R.; Liu, F.; Zhang, L.; Liu, Q. A review of wind speed and wind power forecasting with deep neural networks. Appl. Energy
**2021**, 304, 117766. [Google Scholar] [CrossRef] - Lago, J.; Marcjasz, G.; De Schutter, B.; Weron, R. Forecasting day-ahead electricity prices: A review of state-of-the-art algorithms, best practices and an open-access benchmark. Appl. Energy
**2021**, 293, 116983. [Google Scholar] [CrossRef] - Nowotarski, J.; Weron, R. Recent advances in electricity price forecasting: A review of probabilistic forecasting. Renew. Ad Sustain. Energy Rev.
**2018**, 81, 1548–1568. [Google Scholar] [CrossRef] - Shah, I.; Iftikhar, H.; Ali, S. Modeling and forecasting medium-term electricity consumption using component estimation technique. Forecasting
**2020**, 2, 9. [Google Scholar] [CrossRef] - Shah, I.S.; Jan, F.H.; Ali, S. Functional data approach for short-term electricity demand forecasting. Math. Probl. Eng.
**2022**, 2022, 6709779. [Google Scholar] [CrossRef] - Schmitt, M.; Wanka, R. Particle swarm optimization almost surely finds local optima. Theor. Comput. Sci.
**2015**, 561, 57–72. [Google Scholar] [CrossRef] - Raß, A.; Schmitt, M.; Wanka, R. Explanation of stagnation at points that are not local optima in particle swarm optimization by potential analysis. In Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, Madrid, Spain, 11–15 July 2015; Silva, S., Ed.; Association for Computing Machinery: New York, NY, USA, 2015. [Google Scholar] [CrossRef]
- Dlugosz, R.; Kolasa, M. New fast training algorithm suitable for hardware Kohonen neural networks designed for analysis of biomedical signals. In Proceedings of the 2nd International Conference on Biomedical Electronics and Devices, Oporto, Portugal, 14–17 January 2009; Filho, T.F.B., Gamboa, H., Eds.; INSTICC-Institute for Systems and Technologies of Information, Control and Communication: Setubal, Portugal, 2009. [Google Scholar]
- Scheepers, C.; Engelbrecht, A.P. Analysis of stagnation behavior of competitive coevolutionary trained neuro-controllers. In Proceedings of the 2014 IEEE Symposium on Swarm Intelligence (SIS), Orlando, FL, USA, 9–12 December 2014; IEEE: New York, NY, USA, 2014. [Google Scholar]
- Mendes, R.; Cortez, P.; Rocha, M.; Neves, J. Particle swarms for feedforward neural network training. In Proceedings of the 2002 International Joint Conference on Neural Network, Honolulu, HI, USA, 12–18 May 2002; IEEE: New York, NY, USA, 2002; Volume 1–3. [Google Scholar]
- Gudise, V.G.; Venayagamoorthy, G.K. Comparison of particle swarm optimization and backpropagation as training algorithms for neural networks. In Proceedings of the 2003 IEEE Swarm Intelligence Symposium (SIS 03), Indianapolis, IN, USA, 24–16 April 2003; IEEE: New York, NY, USA, 2003. [Google Scholar]
- Zhang, J.R.; Zhang, J.; Lok, T.M.; Lyu, M.R. A hybrid particle swarm optimization-back-propagation algorithm for feedforward neural network training. Appl. Math. Comput.
**2007**, 185, 1026–1037. [Google Scholar] [CrossRef] - Mirjalili, S.; Hashim, S.Z.M.; Sardroudi, H.M. Training feedforward neural networks using hybrid particle swarm optimization and gravitational search algorithm. Appl. Math. Comput.
**2012**, 218, 11125–11137. [Google Scholar] [CrossRef] - Tarkhaneh, O.; Shen, H.F. Training of feedforward neural networks for data classification using hybrid particle swarm optimization, Mantegna Levy flight and neighborhood search. Helyion
**2019**, 5, e01275. [Google Scholar] [CrossRef] - Cansu, T.; Kolemen, E.; Karahasan, O.; Bas, E.; Egrioglu, E. A new training algorithm for long short-term memory artificial neural network based on particle swarm optimization. In Granular Computing; Springer: Berlin/Heidelberg, Germany, 2023. [Google Scholar] [CrossRef]
- Xue, Y.; Tong, Y.L.; Neri, F. A hybrid training algorithm based on gradient descent and evolutionary computation. In Applied Intelligence; Springer: Berlin/Heidelberg, Germany, 2023. [Google Scholar] [CrossRef]
- Huang, Z.; Lam, H.; Zhang, H. Quantifying Epistemic Uncertainty in Deep Learning. arXiv
**2023**, arXiv:2110.12122. [Google Scholar] [CrossRef] - Abdar, M.; Pourpanah, F.; Hussain, S.; Rezazadegan, D.; Liu, L.; Ghavamzadeh, M.; Fieguth, P.; Cao, X.C.; Khosravi, A.; Acharya, U.R.; et al. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Inf. Fusion
**2021**, 76, 243–297. [Google Scholar] [CrossRef] - Zhou, X.L.; Liu, H.; Pourpanah, F.; Zeng, T.Y.; Wang, X.Z. A survey on epistemic (model) uncertainty in supervised learning: Recent advances and applications. Neurocomputing
**2022**, 489, 449–465. [Google Scholar] [CrossRef] - Gawlikowski, J.; Tassi, C.R.N.; Ali, M.; Lee, J.; Humt, M.; Feng, J.; Kruspe, A.; Triebel, R.; Jung, P.; Roscher, R.; et al. A survey of uncertainty in deep neural networks. In Artificial Intelligence Review; Springer: Berlin/Heidelberg, Germany, 2023. [Google Scholar] [CrossRef]
- Finch, S.J.; Mendell, N.R.; Thode, H.C.J. Probabilistic measures of adequacy of a numerical search for a global maximum. J. Am. Stat. Assoc.
**1989**, 84, 1020–1023. [Google Scholar] [CrossRef] - Ata, M. A convergence criterion for the Monte Carlo estimates. Simul. Model. Pract. Theory
**2007**, 15, 237–246. [Google Scholar] [CrossRef] - Benedetti, L.; Claeys, F.; Nopens, I.; Vanrolleghem, P.A. Assessing the convergence of LHS Monte Carlo simulations of wastewater treatment models. Water Sci. Technol.
**2011**, 63, 2219–2224. [Google Scholar] [CrossRef] - Bayer, C.; Hoel, H.; von Schwerin, E.; Tempone, R. On nonasymptotic optimal stopping criteria in Monte Carlo simulations. SIAM J. Sci. Comput.
**2014**, 36, A869–A885. [Google Scholar] [CrossRef] - Starr, N. Linear estimation of the probability of discovering a new species. Ann. Stat.
**1979**, 7, 644–652. [Google Scholar] [CrossRef] - Mao, C.X. Predicting the conditional probability of discovering a new class. J. Am. Stat. Assoc.
**2004**, 99, 1108–1118. [Google Scholar] [CrossRef] - Lijoi, A.; Mena, R.H.; Prünster, I. Bayesian nonparametric estimation of the probability of discovering new species. Biometrika
**2007**, 94, 769–786. [Google Scholar] [CrossRef] - Blatt, D.; Hero, A.O. On tests for global maximum of the log-likelihood function. IEEE Trans. Inf. Theory
**2007**, 53, 2510–2525. [Google Scholar] [CrossRef] - Actual and Forecasted Wind Energy Feed-In. Available online: https://netztransparenz.tennet.eu/electricity-market/transparency-pages/transparency-germany/network-figures/actual-and-forecast-wind-energy-feed-in/ (accessed on 2 July 2023).
- Annual Peak Load and Load Curve. Available online: https://netztransparenz.tennet.eu/electricity-market/transparency-pages/transparency-germany/network-figures/annual-peak-load-and-load-curve/ (accessed on 2 July 2023).
- State Meteorological Agency (AEMET). Available online: https://www.aemet.es/en/portada (accessed on 2 July 2023).
- Oparaji, U.; Sheu, R.J.; Bankhead, M.; Austin, J. Robust artificial neural network for reliability and sensitivity analyses of complex non-linear systems. Neural Netw.
**2017**, 96, 80–90. [Google Scholar] [CrossRef] [PubMed]

Parameter | Value |
---|---|

Maximum number of training epochs | 50 |

Learning rate | 0.001 |

Number of experiments per batch ($I$) | 10 |

Number of batches ($J$) | 250 |

Stopping tolerance ($\alpha $) | 0.05 |

Autoregressive smoothing order ($\beta $) | 3 |

Number of discretization bins ($L$) | 100 |

Size of wind power and load training dataset | 10,000 |

Size of wind power and load validation dataset | 10,000 |

Size of wind power and load testing dataset | 10,000 |

Size of wind speed training dataset | 4335 |

Size of wind speed validation dataset | 2165 |

Size of wind speed testing dataset | 2165 |

Parameter | Value |
---|---|

Number of hidden units | 30 |

Minimum RMSE (MW) | 151.1853 |

Average RMSE (MW) | 152.1854 |

Maximum RMSE (MW) | 155.7873 |

Probability ($\Gamma $) | 13/2500 |

Computational Time (HH:MM:SS) | 01:51:01 |

Persistent RMSE (MW) | 196.9942 |

Parameter | Value |
---|---|

RMSE (MW) | 160.9722 |

Persistent RMSE (MW) | 202.0250 |

Parameter | Value |
---|---|

Number of hidden units | 5 |

Minimum RMSE (MW) | 329.3708 |

Average RMSE (MW) | 360.1873 |

Maximum RMSE (MW) | 2326.8992 |

Probability ($\Gamma $) | 12/2500 |

Computational Time (HH:MM:SS) | 00:14:46 |

Persistent RMSE (MW) | 347.5793 |

Parameter | Value |
---|---|

RMSE (MW) | 323.6091 |

Persistent RMSE (MW) | 342.3337 |

Parameter | Value | |
---|---|---|

Number of hidden units | 25 | |

Network type | FFNN | LSTM |

Minimum RMSE (m/s) | 1.0635 | 1.0687 |

Average RMSE (m/s) | 1.0969 | 1.1808 |

Maximum RMSE (m/s) | 1.1996 | 1.4474 |

Probability ($\Gamma $) | 11/200 | 25/130 |

Computational Time (HH:MM:SS) | 00:02:13 | 04:36:56 |

Persistent RMSE (m/s) | 1.1814 |

Parameter | Value | |
---|---|---|

Network type | FFNN | LSTM |

RMSE (m/s) | 1.0050 | 1.0104 |

Persistent RMSE (m/s) | 1.0745 |

Parameter | Value |
---|---|

Maximum number of training epochs | 50 |

Learning rate | 0.001 |

Number of experiments per batch ($I$) | 10 |

Number of batches ($J$) | 250 |

Stopping tolerance ($\alpha $) | 0 |

Autoregressive smoothing order ($\beta $) | 3 |

Number of discretization bins ($L$) | 100 |

Size of training dataset | 100 |

Size of validation dataset | 100 |

Size of testing dataset | 100 |

Case | $\mathbf{Hypothesis}\left(\overline{\mathit{k}}\right)$ | $\mathit{\alpha}$ | $\mathbf{Probability}(\Gamma )$ |
---|---|---|---|

Ishigami function (FFNN) | 118 | 0 | 0.004 |

Wind power forecasting (FFNN) | 39 | 0 | 0.0052 |

Load demand forecasting (FFNN) | 8 | 0 | 0.0048 |

Wind speed (FFNN) | 176 | 0.05 | 0.055 |

Wind speed (LSTM) | 56 | 0.05 | 0.1923 |

Case | $\mathbf{Hypothesis}\left(\overline{\mathit{k}}\right)$ | $\mathit{\alpha}$ | $\mathbf{Probability}(\Gamma )$ |
---|---|---|---|

Wind speed (FFNN) | 176 | 0.05 | 0.055 |

Wind speed (FFNN) | 400 | 0.01 | 0.0254 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lujano-Rojas, J.M.; Dufo-López, R.; Artal-Sevil, J.S.; García-Paricio, E.
Searching for Promisingly Trained Artificial Neural Networks. *Forecasting* **2023**, *5*, 550-575.
https://doi.org/10.3390/forecast5030031

**AMA Style**

Lujano-Rojas JM, Dufo-López R, Artal-Sevil JS, García-Paricio E.
Searching for Promisingly Trained Artificial Neural Networks. *Forecasting*. 2023; 5(3):550-575.
https://doi.org/10.3390/forecast5030031

**Chicago/Turabian Style**

Lujano-Rojas, Juan M., Rodolfo Dufo-López, Jesús Sergio Artal-Sevil, and Eduardo García-Paricio.
2023. "Searching for Promisingly Trained Artificial Neural Networks" *Forecasting* 5, no. 3: 550-575.
https://doi.org/10.3390/forecast5030031