# RAdam-DA-NLSTM: A Nested LSTM-Based Time Series Prediction Method for Human–Computer Intelligent Systems

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

- We introduce Nested LSTM (NLSTM), an internal LSTM unit structure designed to guide memory forgetting and memory selection. By incorporating NLSTM as the memory cell of LSTM, we enhance prediction accuracy.
- We develop an autoencoder network based on the Dual Stage Attention Mechanism (DA-NLSTM). This network utilizes an NLSTM encoder with an input attention mechanism and an NLSTM decoder with a time attention mechanism. This design addresses the attention dispersion issue present in traditional LSTM architectures. It effectively captures long-term time dependencies in time series data and enhances feature aggregation within the network.
- We employ the RAdam optimizer to optimize the objective function. RAdam dynamically selects between the Adam and SGD optimizers based on variance dispersion. Additionally, we introduce a rectifier term to bolster the model’s stability.

## 2. RAdam-DA-NLSTM

#### 2.1. Nested LSTM

#### 2.2. DA-NLSTM

#### 2.2.1. The NLSTM Encoder Based on Input Attention Mechanism

#### 2.2.2. The NLSTM Decoder Based on Time Attention Mechanism

#### 2.3. RAdam Optimizer

- Step 1.
- Input the step size ${\alpha}_{t}$, decay rate $\left\{{\beta}_{1},{\beta}_{2}\right\}$, $t=0$.
- Step 2.
- Initialize moving 1st moment and moving 2nd moment, calculate the maximum length ${\rho}^{\infty}$ of the approximated SMA.$${\rho}^{\infty}=2/\phantom{2(1-{\beta}_{2})}\phantom{\rule{0.0pt}{0ex}}(1-{\beta}_{2})-1$$
- Step 3.
- $t=t+1$, calculate the gradient ${g}_{t}$ of objective function, update moving 1st moment ${m}_{t}$ and moving 2nd moment ${v}_{t}$, revise moving 1st moment $\widehat{{m}_{t}}$, and calculate the maximum length ${\rho}_{t}$ of approximated SMA.$${g}_{t}={\nabla}_{\theta}{J}_{t}\left({\theta}_{t-1}\right)$$$${v}_{t}={\beta}_{2}{v}_{t-1}+(1-{\beta}_{2}){g}_{t}^{2}$$$${m}_{t}={\beta}_{1}{m}_{t-1}+(1-{\beta}_{1}){g}_{t}^{}$$$$\widehat{{m}_{t}}={m}_{t}/\phantom{{m}_{t}(1-{\beta}_{1}^{t})}\phantom{\rule{0.0pt}{0ex}}(1-{\beta}_{1}^{t})$$$${\rho}_{t}={\rho}_{\infty}-2t{\beta}_{2}^{t}/\phantom{2t{\beta}_{2}^{t}(1-{\beta}_{2}^{t})}\phantom{\rule{0.0pt}{0ex}}(1-{\beta}_{2}^{t})$$
- Step 4.
- Calculate ${\theta}_{t}$ according to ${\rho}_{t}$. If ${\rho}_{t}>4$, adopt Adam optimizer, revise moving 2nd moment, and build a rectifier item ${r}_{t}$, then obtain the revised moving 2nd moment value $\widehat{{v}_{t}}$ and the model parameters ${\theta}_{t}$.$$\widehat{{v}_{t}}=\sqrt{{v}_{t}/\phantom{{v}_{t}(1-{\beta}_{2}^{t})}\phantom{\rule{0.0pt}{0ex}}(1-{\beta}_{2}^{t})}$$$${r}_{t}=\sqrt{\frac{({\rho}_{t}-4)({\rho}_{t}-2){\rho}_{\infty}}{({\rho}_{\infty}-4)({\rho}_{\infty}-2){\rho}_{t}}}$$$${\theta}_{t}={\theta}_{t-1}-{\alpha}_{t}{r}_{t}\widehat{{m}_{t}}/\phantom{{\alpha}_{t}{r}_{t}\widehat{{m}_{t}}\widehat{{v}_{t}}}\phantom{\rule{0.0pt}{0ex}}\widehat{{v}_{t}}$$If ${\rho}_{t}\le 4$, adopt SGD+Momentum optimizer, then obtain the training parameters ${\theta}_{t}$.$${\theta}_{t}={\theta}_{t-1}-{\alpha}_{t}\widehat{{m}_{t}}$$
- Step 5.
- Output the model parameters ${\theta}_{t}$.

## 3. Experiment and Simulation

#### 3.1. Data Sources

#### 3.2. Parameter Setting

#### 3.3. Comparative Analysis

#### 3.3.1. PM2.5 Prediction

#### 3.3.2. Stock Prediction

#### 3.3.3. Traffic Prediction

#### 3.3.4. Biological Signal Prediction

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Feng, G.; Li, Z.; Zhou, W. Research summary of big data analysis technology in network field. Comput. Sci.
**2019**, 46, 20. [Google Scholar] - Yu, J.; Xu, Y.; Chen, H.; Ju, Z. Versatile graph neural networks toward intuitive human activity understanding. IEEE Trans. Neural Netw.
**2022**, 1–13. [Google Scholar] [CrossRef] [PubMed] - Pahlawan, M.R.; Riksakomara, E.; Tyasnurita, R.; Muklason, A.; Vinarti, R.A. Stock price forecast of macro-economic factor using recurrent neural network. IAES Int. J. Artif. Intell.
**2021**, 10, 74–83. [Google Scholar] [CrossRef] - Wankuan, B. Research and Application of RNN Neural Network in Stock Index Price Prediction Model. Ph.D. Thesis, Chongqing University, Chongqing, China, 2019. [Google Scholar]
- Dai, X.; Liu, J.; Li, Y. A recurrent neural network using historical data to predict time series indoor PM2.5 concentrations for residential buildings. Indoor Air
**2021**, 31, 1228–1237. [Google Scholar] [CrossRef] [PubMed] - Huang, Y.; Zhao, H.; Huang, X. A Prediction Scheme for Daily Maximum and Minimum Temperature Forecasts Using Recurrent Neural Network and Rough set. IOP Conf. Ser. Earth Environ. Sci.
**2019**, 237, 022005. [Google Scholar] [CrossRef] - Wunsch, A.; Pitak-Arnnop, P. Strategic planning for maxillofacial trauma and head and neck cancers during COVID-19 pandemic—December 2020 updated from Germany. Am. J. Otolaryngol.
**2021**, 42, 102932. [Google Scholar] [CrossRef] [PubMed] - Bengio, Y. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw.
**2002**, 5, 157–166. [Google Scholar] [CrossRef] - Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput.
**1997**, 9, 1735–1780. [Google Scholar] [CrossRef] - Yu, J.; Gao, H.; Zhou, D.; Liu, J.; Gao, Q.; Ju, Z. Deep temporal model-based identity-aware hand detection for space human–robot interaction. IEEE Trans. Cyber.
**2022**, 15, 13738–13751. [Google Scholar] [CrossRef] - Wang, Y.; Xie, D.; Wang, X.; Li, G.; Zhu, M.; Zhang, Y. Wind turbine network interaction prediction based on pca-lstm model. Chin. J. Electr. Eng.
**2019**, 39, 11. [Google Scholar] - Karevan, Z.; Suykens, J. Transductive LSTM for time-series prediction: An application to weather forecasting. Neural Netw.
**2020**, 125, 1–9. [Google Scholar] [CrossRef] [PubMed] - Xie, J.; Wang, Q. Benchmarking Machine Learning Algorithms on Blood Glucose Prediction for Type I Diabetes in Comparison with Classical Time-Series Models. IEEE Trans. Biomed. Eng.
**2020**, 67, 3101–3124. [Google Scholar] [CrossRef] [PubMed] - Pathan, R.K.; Biswas, M.; Khandaker, M.U. Time Series Prediction of COVID-19 by Mutation Rate Analysis using Recurrent Neural Network-based LSTM Model. Chaos Solitons Fractals
**2020**, 138, 110018. [Google Scholar] [CrossRef] - Cho, K.; Merrienboer, B.V.; Bahdanau, D.; Bengio, Y. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. Comput. Sci. 2014; in press. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Feng, J.; Li, Y.; Zhao, K.; Xu, Z.; Jin, D. DeepMM: Deep Learning Based Map Matching with Data Augmentation. IEEE Trans. Mob. Comput.
**2020**, 21, 2372–2384. [Google Scholar] [CrossRef] - Rao, H.; Xu, S.; Hu, X.; Cheng, J.; Hu, B. Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition—ScienceDirect. Inf. Sci.
**2021**, 569, 90–109. [Google Scholar] [CrossRef] - Baddar, W.J.; Ro, Y.M. Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 3215–3223. [Google Scholar]
- Pandey, D.; Chowdary, R. Modeling coherence by ordering paragraphs using pointer networks. Neural Netw.
**2020**, 126, 36–41. [Google Scholar] [CrossRef] - Teng, X.; Zhang, Y.; Zhou, D.; He, M.; Han, M.; Liu, X. A two-stage deep learning model based on feature combination effects. Neurocomputing
**2022**, 512, 307–322. [Google Scholar] [CrossRef] - Tang, Y.; Yu, F.; Pedrycz, W.; Yang, X.; Liu, S. Building trend fuzzy granulation based LSTM recurrent neural network for long-term time series forecasting. IEEE Trans. Fuzzy Syst.
**2021**, 30, 1599–1613. [Google Scholar] [CrossRef] - Gan, Y.; Mao, Y.; Zhang, X.; Ji, S.; Pu, Y.; Han, M.; Yin, J.; Wang, T. Is your explanation stable? A Robustness Evaluation Framework for Feature Attribution. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security(CCS 2022), Los Angeles, CA, USA, 7–11 November 2022. [Google Scholar]
- Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Cottrell, G.W. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017. [Google Scholar]
- Li, Y.; Zhu, Z.; Kong, D.; Han, H.; Zhao, Y. EA-LSTM: Evolutionary attention-based LSTM for time series prediction. Knowl.-Based Syst.
**2019**, 181, 104785.1–104785.8. [Google Scholar] [CrossRef] [Green Version] - Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. Comput. Sci. 2014; in press. [Google Scholar]
- Sun, L.; Su, T.; Zhou, S.; Yu, L. GMU: A Novel RNN Neuron and Its Application to Handwriting Recognition. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017. [Google Scholar]
- Tao, L.; Yu, Z. Training RNNs as Fast as CNNs. In Proceedings of the Conferenc on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, 7–11 September 2017. [Google Scholar]
- Chen, P.; Fu, X. A Graph Convolutional Stacked Bidirectional Unidirectional-LSTM Neural Network for Metro Ridership Prediction. IEEE Trans. Intell. Transp. Syst.
**2021**, 23, 6950–6962. [Google Scholar] [CrossRef] - Dikshit, A.; Pradhan, B.; Alamri, A.M. Long lead time drought forecasting using lagged climate variables and a stacked long short-term memory model. Sci. Total Environ.
**2021**, 755, 142638. [Google Scholar] [CrossRef] - Yuan, W.; Hu, F.; Lu, L. A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference. Appl. Intell.
**2021**, 52, 3939–3953. [Google Scholar] [CrossRef] - Bera, S.; Shrivastava, V.K. Analysis of various optimizers on deep convolutional neural network model in the application of hyperspectral remote sensing image classification. Int. J. Remote Sens.
**2020**, 41, 2664–2683. [Google Scholar] [CrossRef] - Haoyue, Y.; Tao, S.; Yan, Z.; Yingli, L.; Zhengtao, Y. Terahertz spectrum recognition based on bidirectional long-term and short-term memory network. Spectrosc. Spectr. Anal.
**2019**, 39, 6. [Google Scholar] - Liu, L.; Jiang, H.; He, P.; Chen, W.; Han, J. On the Variance of the Adaptive Learning Rate and Beyond. In Proceedings of the International Conference on Learning Representations (ICLR 2020), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]

Datasets | L | $\mathit{m}1$ | $\mathit{m}2$ | b |
---|---|---|---|---|

Beijing PM2.5 | 10 | 64 | 64 | 64 |

NASDAQ 100 | 15 | 128 | 128 | 64 |

California traffic volume | 15 | 64 | 64 | 64 |

Seattle traffic speed | 20 | 64 | 64 | 64 |

ECG signal | 15 | 128 | 128 | 64 |

BCG signal | 20 | 128 | 128 | 64 |

Models | MAE | MAPE | RMSE | ${\mathit{R}}^{2}$ | ${\mathit{R}}^{2}$ (TS) |
---|---|---|---|---|---|

SVM | 1.1451 | 4.6632 | 1.2365 | 0.6532 | 0.7217 |

RNN | 0.6621 | 3.6754 | 0.7321 | 0.7229 | 0.8323 |

GRU | 0.6323 | 3.3632 | 0.6941 | 0.7892 | 0.8814 |

LSTM | 0.6098 | 3.1946 | 0.6380 | 0.7921 | 0.8920 |

A-LSTM | 0.2324 | 2.8312 | 0.5919 | 0.8126 | 0.8620 |

DA-LSTM | 0.2032 | 2.7328 | 0.5521 | 0.8181 | 0.8705 |

RAdam-DA-NLSTM | 0.1921 | 2.5217 | 0.5117 | 0.8213 | 0.8831 |

Models | MAE | MAPE | RMSE | ${\mathit{R}}^{2}$ | ${\mathit{R}}^{2}$ (TS) |
---|---|---|---|---|---|

SVM | 0.3901 | 0.5013 | 0.5211 | 0.7172 | 0.8272 |

RNN | 0.2163 | 0.2586 | 0.2681 | 0.7621 | 0.9031 |

GRU | 0.2031 | 0.2512 | 0.2761 | 0.7663 | 0.9226 |

LSTM | 0.1821 | 0.1931 | 0.2317 | 0.7874 | 0.9306 |

A-LSTM | 0.0923 | 0.1034 | 0.1522 | 0.8021 | 0.9155 |

DA-LSTM | 0.0478 | 0.0526 | 0.0632 | 0.8302 | 0.9207 |

RAdam-DA-NLSTM | 0.0301 | 0.0516 | 0.0531 | 0.8825 | 0.9361 |

Models | MAE | MAPE | RMSE | ${\mathit{R}}^{2}$ | ${\mathit{R}}^{2}$ (TS) |
---|---|---|---|---|---|

SVM | 0.2903 | 0.9233 | 0.3218 | 0.7621 | 0.8562 |

RNN | 0.1642 | 0.5215 | 0.2811 | 0.8054 | 0.9172 |

GRU | 0.1327 | 0.3291 | 0.2316 | 0.8298 | 0.9238 |

LSTM | 0.1244 | 0.2282 | 0.1843 | 0.8536 | 0.9423 |

A-LSTM | 0.0836 | 0.2132 | 0.1641 | 0.8721 | 0.9321 |

DA-LSTM | 0.0624 | 0.1801 | 0.0912 | 0.8863 | 0.9626 |

RAdam-DA-NLSTM | 0.0598 | 0.1721 | 0.0825 | 0.9136 | 0.9645 |

Models | MAE | MAPE | RMSE | ${\mathit{R}}^{2}$ | ${\mathit{R}}^{2}$ (TS) |
---|---|---|---|---|---|

SVM | 0.8931 | 4.6548 | 1.2031 | 0.6518 | 0.7931 |

RNN | 0.4362 | 3.9325 | 0.9121 | 0.7029 | 0.8216 |

GRU | 0.4210 | 3.8978 | 0.8945 | 0.7112 | 0.8344 |

LSTM | 0.3834 | 3.6427 | 0.8649 | 0.7235 | 0.8367 |

A-LSTM | 0.3756 | 3.5471 | 0.7921 | 0.7616 | 0.8721 |

DA-LSTM | 0.2921 | 2.9221 | 0.4382 | 0.7915 | 0.8925 |

RAdam-DA-NLSTM | 0.2651 | 2.6945 | 0.4213 | 0.8120 | 0.9024 |

Models | MAE | MAPE | RMSE | ${\mathit{R}}^{2}$ | ${\mathit{R}}^{2}$ (TS) |
---|---|---|---|---|---|

SVM | 0.2991 | 3.0238 | 0.4832 | 0.7232 | 0.8136 |

RNN | 0.2834 | 2.7622 | 0.4025 | 0.7621 | 0.8648 |

GRU | 0.2802 | 2.3254 | 0.3819 | 0.7796 | 0.8432 |

LSTM | 0.2725 | 2.2435 | 0.3021 | 0.7728 | 0.8560 |

A-LSTM | 0.2531 | 1.8432 | 0.2563 | 0.8126 | 0.8922 |

DA-LSTM | 0.2289 | 1.2003 | 0.2016 | 0.8345 | 0.8837 |

RAdam-DA-NLSTM | 0.2232 | 1.2189 | 0.1782 | 0.8426 | 0.8856 |

Models | MAE | MAPE | RMSE | ${\mathit{R}}^{2}$ | ${\mathit{R}}^{2}$ (TS) |
---|---|---|---|---|---|

SVM | 0.5326 | 3.2431 | 0.6238 | 0.6532 | 0.6865 |

RNN | 0.3211 | 3.0121 | 0.3442 | 0.7254 | 0.7773 |

GRU | 0.2967 | 2.8320 | 0.3002 | 0.7422 | 0.7932 |

LSTM | 0.2332 | 2.9233 | 0.3126 | 0.7527 | 0.7942 |

A-LSTM | 0.2017 | 2.5321 | 0.2812 | 0.7632 | 0.8329 |

DA-LSTM | 0.1822 | 2.3208 | 0.2636 | 0.7921 | 0.8232 |

RAdam-DA-NLSTM | 0.1675 | 2.3026 | 0.2431 | 0.8053 | 0.8321 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Liu, B.; Chen, W.; Wang, Z.; Pouriyeh, S.; Han, M.
RAdam-DA-NLSTM: A Nested LSTM-Based Time Series Prediction Method for Human–Computer Intelligent Systems. *Electronics* **2023**, *12*, 3084.
https://doi.org/10.3390/electronics12143084

**AMA Style**

Liu B, Chen W, Wang Z, Pouriyeh S, Han M.
RAdam-DA-NLSTM: A Nested LSTM-Based Time Series Prediction Method for Human–Computer Intelligent Systems. *Electronics*. 2023; 12(14):3084.
https://doi.org/10.3390/electronics12143084

**Chicago/Turabian Style**

Liu, Banteng, Wei Chen, Zhangquan Wang, Seyedamin Pouriyeh, and Meng Han.
2023. "RAdam-DA-NLSTM: A Nested LSTM-Based Time Series Prediction Method for Human–Computer Intelligent Systems" *Electronics* 12, no. 14: 3084.
https://doi.org/10.3390/electronics12143084