Feature Extraction from Satellite-Derived Hydroclimate Data: Assessing Impacts on Various Neural Networks for Multi-Step Ahead Streamflow Prediction

Ghobadi, Fatemeh; Tayerani Charmchi, Amir Saman; Kang, Doosun

doi:10.3390/su152215761

Open AccessArticle

Feature Extraction from Satellite-Derived Hydroclimate Data: Assessing Impacts on Various Neural Networks for Multi-Step Ahead Streamflow Prediction

by

Fatemeh Ghobadi

,

Amir Saman Tayerani Charmchi

and

Doosun Kang

^*

Department of Civil Engineering, College of Engineering, Kyung Hee University, 1732 Deogyeong-daero, Giheung-gu, Yongin-si 17104, Gyeonggi-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(22), 15761; https://doi.org/10.3390/su152215761

Submission received: 4 October 2023 / Revised: 25 October 2023 / Accepted: 2 November 2023 / Published: 9 November 2023

(This article belongs to the Special Issue The Next Generation on Water Resource Management Using Computer Aid Models)

Download

Browse Figures

Versions Notes

Abstract

:

Enhancing the generalization capability of time-series models for streamflow prediction using dimensionality reduction (DR) techniques remains a major challenge in water resources management (WRM). In this study, we investigated eight DR techniques and their effectiveness in mitigating the curse of dimensionality, which hinders the performance of machine learning (ML) algorithms in the field of WRM. Our study delves into the most non-linear unsupervised representative DR techniques, including principal component analysis (PCA), kernel PCA (KPCA), multi-dimensional scaling (MDS), isometric mapping (ISOMAP), locally linear embedding (LLE), t-distributed stochastic neighbor embedding (t-SNE), Laplacian eigenmaps (LE), and autoencoder (AE), examining their effectiveness in multi-step ahead (MSA) streamflow prediction. In this study, we conducted a conceptual comparison of these techniques. Subsequently, we focused on their performance in four different case studies in the USA. Moreover, we assessed the quality of the transformed feature spaces in terms of the MSA streamflow prediction improvement. Through our investigation, we gained valuable insights into the performance of different DR techniques within linear/dense/convolutional neural network (CNN)/long short-term memory neural network (LSTM) and autoregressive LSTM (AR-LSTM) architectures. This study contributes to a deeper understanding of suitable feature extraction techniques for enhancing the capabilities of the LSTM model in tackling high-dimensional datasets in the realm of WRM.

Keywords:

dimension reduction; feature engineering; feature extraction; LSTM network; streamflow prediction; water resources management

1. Introduction

Anticipating and understanding hydrological cycle parameters, such as streamflow and precipitation, are of utmost significance for the effective management of water resources [1,2,3,4,5,6,7,8,9,10]. The hydrological cycle, an intricate and ever-changing process owing to climate change, involves a multitude of factors that influence the prediction of hydrological parameters [11,12]. To replicate this complex process, hydrological models have been crafted using diverse approaches, encompassing physics-based and numerical methods, as well as various data-driven techniques [13,14]. Hydrological models are broadly classified into two categories: physical-driven and data-driven models [15,16,17]. Physical models face limitations in dealing with straightforward hydrological processes because of the intricate relationships between environmental variables and streamflow [18]. However, the latter involves statistical and artificial intelligence (AI) approaches without the need for an explicit physics-based scheme [19]. Such computer-aided, data-driven models facilitate the resolution of real-world problems by offering innovative solutions and insights, which are invaluable in achieving effective water resources management (WRM). Efficient WRM is a critical aspect of environmental and sustainable development and involves the careful monitoring, assessment, and utilization of water systems to ensure the judicious allocation and conservation of water resources [20,21,22,23]. By leveraging the synergies of computer-aided models and advanced analytical tools, it is plausible to enhance the resilience and adaptability of water management systems, thereby aligning them with the overarching objectives of sustainable development and ecological balance [6].

1.1. Research Background and Motivation

With the increasing complexities of water-related issues, data-driven approaches have become essential for decision-making processes [24]. However, the abundance of data in developed countries generally leads to high-dimensional datasets, which present challenges in terms of analysis and interpretation [25]. Moreover, the application of multi-step ahead (MSA) multivariate not only increases complexity but also intricates the curse of dimensionality prediction, thereby exacerbating the situation. The predictability of MSA multivariate prediction against MSA univariate prediction has been discussed in several studies [16,26]. MSA multivariate prediction is a complex forecasting approach that considers multiple data series, with each series representing a different variable or feature [16,17,27]. In particular, MSA multivariate prediction models are more susceptible to dimensionality with rising number of variables. Each added variable expands the potential feature space, complicating the modeling process, particularly when the relationships between the variables are nonlinear. Considering the aspect of model parameters, the introduction of additional variables can lead to a significant increase in model parameters, especially if interactions between variables are considered, resulting in more complex models requiring careful tuning and validation. Moreover, owing to the increased model complexity and parameters, the computational costs of MSA multivariate models are substantially higher than those of MSA univariate and single-step-ahead univariate/multivariate predictions. Another recent challenge is that the data length is inadequate, but the number of available variables is high, which increases the curse of dimensionality. However, training, especially with complex autoencoder algorithms such as deep neural networks (DNNs) and convolutional neural networks (CNNs), can be significantly more time-consuming and challenging [28]. As feature engineering tools, dimension reduction (DR) and feature selection techniques are crucial for handling these challenges by extracting relevant information and reducing the computational burden while preserving the integrity of the data [29].

DR techniques can be broadly categorized into linear and non-linear approaches. Linear methods such as principal component analysis (PCA) operate by projecting high-dimensional data onto a lower-dimensional subspace using a linear transformation [30]. These transformations are often derived from linear algebraic principles such as eigenvalue decomposition. In contrast, non-linear methods are employed when the data exhibit complex non-linear relationships and structures. Examples of non-linear techniques include kernel PCA (KPCA) [31,32], multi-dimensional scaling (MDS) [33], isometric mapping (ISOMAP) [34], locally linear embedding (LLE) [35], t-distributed stochastic neighbor embedding (t-SNE) [36], Laplacian eigen maps (LE) [35], and autoencoder (AE) [37]. In particular, these methods aim to preserve the intrinsic geometry or manifold structure of data while reducing its dimensionality. In addition, DR algorithms can be classified into supervised and unsupervised frameworks based on their learning paradigms. Unsupervised methods focus solely on the feature matrix to identify the underlying patterns without considering any associated labels or responses. In contrast, supervised techniques incorporate both feature and response variables to perform reduction.

The overarching objective of employing DR techniques is to improve the efficacy and computational efficiency of data analysis tasks [38]. By reducing the dimensionality of the dataset, these methods aim to enhance data quality and facilitate more accurate data mining processes. However, as water-related issues have become increasingly complex and interconnected, traditional approaches have been rendered inadequate for handling the vast and diverse datasets generated by modern monitoring and sensing technologies. This paradigm shift has led to the adoption of hybridized data-driven approaches, which leverage advanced data analytics, machine learning (ML), and statistical techniques to formulate informed decisions. The advent of remote sensing, IoT devices, and continuous monitoring systems has resulted in a data deluge, leading to high-dimensional datasets. These datasets may encompass a wide array of water-related parameters; however, analyzing such massive datasets can be computationally intensive, expensive, and challenging, making it difficult to extract meaningful insights. In this context, DR techniques emerge as an advantageous tool to address this challenge. DR techniques allow high-dimensional datasets to be converted into lower-dimensional representations while preserving essential information. By reducing the dimensionality of data, these techniques enable more efficient data transformation, processing, visualization, and analysis. Thus, DR and feature selection methods enable water resources managers and researchers to explore and analyze data efficiently, gain valuable insights, and identify trends, anomalies, and patterns. This knowledge contributes to evidence-based decision making, leading to more effective and sustainable WRM practices. These techniques facilitate the integration of disparate data sources by combining information from various monitoring networks, satellites, and environmental models. Integrated data can reveal complex interrelationships between different hydrological parameters and assist in predicting future water quantity, availability, quality, and potential risks.

1.2. Literature Review

Rapid advancements in artificial intelligence (AI) and ML have revolutionized various fields including data analytics, pattern recognition, and decision-making. In this regard, strategies for mitigating the spatial and temporal variability of hydrological variables, addressing data gaps and uncertainties, and ensuring the interpretability and accuracy of reduced data are ongoing areas of research. Notably, WRM relies on data-driven approaches to address the complexities of modern hydrology-related issues. In this regard, DR and feature selection techniques are instrumental in understanding large and diverse datasets, enabling efficient analysis and decision-making while maintaining data integrity. By harnessing the power of these techniques along with the deep learning (DL) algorithm, water resources managers can better understand and optimize resource allocation and implement sustainable practices for the benefit of both people and the environment. Various DR techniques have been employed for time-series data analysis as boosters to enhance prediction performance across diverse disciplines, including wind power forecasting [39,40], solar irradiation forecasting [41], energy consumption forecasting [42], and fluid flow modeling [19], as important feature engineering procedures to eliminate high dimensionality. In the application of DR techniques to daily runoff forecasting, Zhang et al. [43] employed PCA to minimize dimensionality and mitigate data redundancy across six distinct input scenarios. Their findings underscored the scenarios that were integrated with PCA consistently outperformed those devoid of PCA application, underpinning the efficacy of DR methods in isolating pivotal information from raw data into synthesized variables while concurrently obviating extraneous data perturbations. Furthermore, Chang et al. [44] employed a combination of PCA with the Self-Organizing Map (SOM) and Non-linear Autoregressive with Exogenous Inputs (NARX) to distill the principal components emblematic of disparate spatial distributions inherent to urban inundation. Comparative analyses revealed that the PCA-SOM-NARX framework demonstrated superior predictive outcomes relative to the SOM-NARX baseline, highlighting the pivotal role of PCA in enhancing prediction efficacy. Additionally, Haddad and Rahman [45] employed both linear DR techniques, specifically canonical correlation analysis (CCA), and non-linear methods, notably MDS and kernel CCA, for regional flood frequency analysis. The empirical results of their research illustrated that non-linear methodologies yielded superior performance over their linear counterparts.

To the best of our knowledge, non-linear unsupervised DR techniques have not been extensively used in the field of hydrology. Furthermore, a thorough examination of various techniques to address the curse of dimensionality has not been extensively examined within the hydrological domain, particularly for multi-step ahead streamflow prediction. This issue of dimensionality becomes particularly pronounced when employing recurrent layers and recursive algorithms. To address these gaps, this study explored the effectiveness of eight DR techniques in improving the accuracy of MSA monthly streamflow predictions and reducing the effect of the curse of dimensionality using four case studies in the USA. In this study, we propose a hybrid model by combining five models using eight DR techniques. The hybrid models comprise combinations of linear/dense/CNN/LSTM and AR-LSTM with seven non-linear unsupervised approaches: KPCA, MDS, LLE, ISOMAP, LE-, t-SNE, AE, and PCA as linear benchmark. This study represents a pioneering effort to explore the state-of-the-art performance of the MDS, LLE, ISOMAP, LE, and t-SNE techniques for MSA streamflow prediction. By leveraging the strengths of these DR techniques in conjunction with the predictive power of the prediction algorithm, we aimed to enhance the accuracy of MSA streamflow predictions for more efficient WRM. Moreover, through the integration of advanced DR techniques, we focused on enhancing MSA streamflow predictions, which will contribute to the advancement of modeling and forecasting techniques.

The remainder of this paper is organized as follows. Section 2 presents the following materials and methods in our study: multidimensional scaling (Section 2.1), kernel principal component (Section 2.2), local linear embedding (Section 2.3), isometric mapping (Section 2.4), Laplacian eigenmap (Section 2.5), t-distributed stochastic neighbor (Section 2.6), autoencoder (Section 2.7), experimental setting (Section 2.8), and performance evaluation metrics (Section 2.9). Section 3 presents a case study. The results are presented in Section 4 with three subsections focusing on data preprocessing (Section 4.1), informative feature synthesis through dimensionality reduction (Section 4.2), and prediction model evaluation (Section 4.3). A discussion is presented in Section 5 and contribution to future studies is presented in Section 6. Based on the study findings, conclusions are drawn in Section 7 along with directions for future research.

2. Materials and Methods

This section outlines the approach and methodology employed in our research to enhance the MSA streamflow prediction using the various neural networks algorithms coupled with seven non-linear unsupervised DR techniques. To achieve this coupled strategy, we integrated the DR techniques with linear, dense, CNN, LSTM, and AR-LSTM layers. By combining these architectures, we aimed to improve the accuracy and efficiency of MSA streamflow prediction, enabling informed decision making and sustainable WRM. The subsequent subsections discuss the specific details of the DR techniques employed and the overall methodology adopted in our research. In particular, we examine the selection of case studies from diverse regions within the USA, each characterized by varying levels of annual precipitation. As these regions encompass distinct hydrological behaviors and challenges, they are ideal candidates for evaluating the efficacy of our approach.

2.1. Multidimensional Scaling (MDS)

MDS, commonly referred to as Principal Coordinate Analysis, is a global non-linear technique used to maintain the global characteristics of data [46]. MDS is utilized for the visualization of high-dimensional data (d-dimensions) within a reduced-dimensional space (s-dimensions) while maintaining the pairwise distances between data points [47]. The fundamental objective of an MDS is to create a representation of the data that encapsulates similarities or disparities among data points within a multidimensional context. The operational framework of the MDS encompasses the mapping of data points into lower dimensions through the computation of a dissimilarity or distance matrix. This matrix ensures the aggregation of similar data points and maintains spatial separation among dissimilar data points. By employing the distance matrix

d^{X}

as a foundational reference, the MDS identifies the output Y that optimizes the congruence between

d^{X}

and

d^{Y}

. Herein,

d^{X} = \sqrt{(x_{i} - x_{j})}

and

d^{Y} = \sqrt{(y_{i} - y_{j})}

signify the distances between any given pair of points

i

and

j

. During this progression, the MDS transforms the primary distance matrix

d^{X}

into a kernel matrix K, defined as follows [33]:

K = H d^{X} H

(1)

where

H = I - \frac{1}{N} e e^{T}

represents a centering matrix,

I

represents the identity matrix, and e constitutes a column vector of 1 s. Subsequently, K undergoes Eigen decomposition, leading to the formulation

K \to B . D . B^{T}

. In this context,

D

denotes a diagonal matrix housing the Eigenvalues of

K

, and

B

encompasses the corresponding eigenvectors. The remaining values and their associated vectors were omitted for judicious retention of the foremost k eigenvalues. Thus,

D

and

B

evolve into

\hat{D}

and

B_{1}

, where

\hat{D} \in R^{k \times k}

constitutes the diagonal matrix of the leading

k

eigenvalues, and

B_{1} \in R^{n \times k}

incorporates the prominent

k

eigenvectors. Consequently, the data is maintained in a lower dimension through the following transformation:

Y = {\hat{D}}^{\frac{1}{2}} . {B_{1}}^{T}

(2)

2.2. Kernel Principal Component Analysis (KPCA)

Principal component analysis (PCA) is a widely used technique for dimension reduction and feature extraction; however, it offers limited performance when dealing with non-linear data because the resultant subspace may not optimally capture the underlying structures. This limitation is effectively addressed using kernel PCA (KPCA), which employs the “kernel trick”, represented by the kernel function K as follows [32,47,48]:

K (x_{i}, x_{j}) = ϕ {(x_{i})}^{T} . ϕ (x_{j})

(3)

KPCA leverages this kernel function to project data into a higher-dimensional feature space, enabling the data to be linearly separable and reveal previously hidden patterns and structures. The central concept driving KPCA involves mapping the input data X onto a higher-dimensional space using polar coordinates before applying the kernel trick [32]. This kernel function obviates the need for explicit data mapping via the function

ϕ

, thereby enhancing computational efficiency [31,47]. The selection of an appropriate kernel function is pivotal in the KPCA and depends on the nature of the data. The commonly employed kernel functions are listed in Table 1 [37]. Interested readers can refer to [14,19,29] for a detailed explanation.

Upon applying the kernel trick, the subsequent step involves employing PCA on the transformed linearly separable data, thereby effectively reducing its dimensionality. By incorporating KPCA, the limitations of traditional PCA on non-linear data can be circumvented, allowing for the extraction of essential features and patterns that would have otherwise remained concealed. This advanced technique has widespread applications in fields, such as image analysis and bioinformatics, offering an invaluable tool for gaining insights from complex high-dimensional datasets.

2.3. Local Linear Embedding (LLE)

Local linear embedding (LLE) is a powerful non-linear and unsupervised dimension-reduction technique that operates under the premise that data reside on a smooth non-linear manifold within a high-dimensional feature space. Unlike MDS, LLE is a local non-linear technique that focuses on maintaining the local characteristics of the data [46]. By capturing the intrinsic relationships and local properties of data, LLE uncovers essential patterns that may remain concealed in the original high-dimensional space. The novelty of LLE is its ability to maintain geometric features and local properties during dimensionality reduction, thereby accurately representing the underlying data structure. This preservation is achieved by considering the linear combination of data points (

x_{i}

) with their c-nearest neighbors (

x_{j}

) guided by the reconstruction weights (

W_{i j}

). The dimensionality reduction process comprises a series of steps as follows:

Nearest-neighbor identification: LLE commences by identifying the c-nearest neighbors of each data point using the Euclidean distance metric. These neighbors play a pivotal role in the construction of local linear relationships.

Local reconstruction weights: For each data point

x_{i}

, LLE computes the reconstruction weights that optimally reconstruct

x_{i}

as a linear combination of its neighbors. The goal is to minimize the reconstruction error (

E r r o r (W)

), ensuring that the local properties of

x_{i}

are preserved in low-dimensional space, as follows [49]:

E r r o r (W) = \sum_{i} {|{\vec{x}}_{i} - \sum_{j} W_{i j} . {\vec{x}}_{j}|}^{2}

(4)

Through eigenvector-based optimization, LLE defines a new vector space (low-dimensional embedding) denoted as

Y

. This space is crafted to minimize the cost associated with

Y

, which is achieved by accurately reconstructing the data points from their nearest neighbors. This optimization process ensures that local linear relationships are transferred to a lower-dimensional representation.

The concept of reconstruction error is central to LLE, which quantifies the extent to which a data point can be reconstructed based on its neighbors. The contribution of each neighbor to the reconstruction of a data point is encapsulated in the term

W_{i j}

, which summarizes the influence of neighbor

x_{j}

on the reconstruction of

x_{i}

. LLE transforms high-dimensional data into a lower-dimensional space while retaining the local properties and relationships that define the underlying data structure. By emphasizing the local relationships and manifold geometry, LLE offers a powerful tool for understanding and visualizing complex datasets, and this ability renders it beneficial for manifold learning, pattern recognition, and data analysis. Interested readers can refer to [35,37].

2.4. Isometric Mapping (ISOMAP)

Isometric Mapping, known as Isomap, is a global non-linear dimensionality reduction technique that aims to capture the underlying geometric structure of high-dimensional data by preserving pairwise geodesic distances between data points [34,46]. It constructs a lower-dimensional representation that maintains the intrinsic distances, or “geodesic distances”, between points on a manifold. Isomap is particularly advantageous for datasets with nonlinear structures or when the data lie on a curved manifold. This technique initially generates a neighborhood graph using geodesic distance determination. Rather than Euclidean distances, Isomap focuses on geodesic distances that measure the shortest path along the manifold between two data points. The computed geodesic distances are used to construct a matrix that captures the pairwise geodesic distances between data points,

d^{G}

. This matrix

d^{G}

is subsequently factorized, typically using techniques such as Classical MDS with a centering matrix

H

, to determine the lower-dimensional representation that optimally preserves geodesic distances, as follows:

K = H d^{G} H

(5)

Isomap proceeds to determine the lower-dimensional representation of data points using the expression provided in Equation (2), where

\hat{D}

and

B

denote the corresponding eigenvalues and eigenvectors [47].

2.5. Laplacian Eigenmap (LE)

The Laplacian eigenmap (LE) is a local non-linear unsupervised DR technique that aims to capture the underlying data structure by focusing on the relationships between data points and their neighbors [50]. This approach is particularly effective for non-linear dimensionality reduction and manifold learning. The LE constructs a lower-dimensional representation by emphasizing the intrinsic geometry of the data and preserving local relationships. Similar to LLE, the LE begins by constructing a graph

\overset{´}{G}

of the data, where data points are nodes, and edges represent the relationships between neighboring points. The weights of the edges in the graph were obtained using the Gaussian kernel function (RBF kernel function), as follows [50]:

W_{i j} = e^{- \frac{{‖x_{i} - x_{j}‖}^{2}}{s σ^{2}}}

(6)

where

σ

represents the Gaussian variance, which is used as the input for forming the sparse adjacency matrix

W

. During the computation of lower-dimensional representations

y_{i}

, the focus is on minimizing the cost function, which is formulated as follows [50].

ϕ (Y) = \sum_{i j} {(y_{i} - y_{j})}^{2} . W_{i j}

(7)

The allocation of

W_{i j}

within the cost function corresponds to minimizing the distances between the data points

x_{i}

and

x_{j}

. Consequently, the distinction between the lower-dimensional representations

y_{i}

and

y_{j}

significantly affects the cost function. This results in the convergence of neighboring points from a high-dimensional space to proximity in a reduced-dimensional representation. The computation of the degree matrix

M

and graph Laplacian

L

for graph W is pivotal for framing the minimization problem as an eigenproblem. The Laplacian matrix

L

is obtained by

L = M - W

. The Laplacian matrix

L

is subjected to eigendecomposition, yielding a set of eigenvalues and corresponding eigenvectors. The reformed cost function is as follows [50].

ϕ (Y) = \sum_{i j} {(y_{i} - y_{j})}^{2} . W_{i j} = 2 Y^{T} L Y

(8)

The LE computes the Laplacian matrix, which characterizes the relationships between data points. The Laplacian matrix captures both the local structure and the smoothness of the data distribution. A lower-dimensional representation is formed by selecting the top eigenvectors associated with the smallest eigenvalues. These eigenvectors define a new space in which data points are projected, resulting in lower-dimensional embedding.

2.6. t-Distributed Stochastic Neighbor Embedding (t-SNE)

Another non-linear unsupervised DR technique employed in this study was t-distributed Stochastic Neighbor Embedding (t-SNE), which is a non-linear DR technique, as proposed by Van Der Maaten and Hinton in 2008. Rather than relying on a Gaussian distribution, this method evaluates the similarity between two points within a reduced-dimensional space using Student’s t-distribution. Given an input dataset,

X

, with dimensions delineated as

X = \{x_{1}, x_{2}, \dots, x_{n}\} \in R^{d}

the t-SNE method derives an embedding in s-dimensions represented as

Y = \{y_{1}, y_{2}, \dots, y_{n}\} \in R^{s}

. This method determines similarities in the original high-dimensional dataset using conditional probabilities

p_{j |i}

based on a Gaussian distribution. Subsequently, in the reduced-dimensional space, the joint probabilities

q_{i j}

are computed using Student’s t-distribution, which is typified by a single degree of freedom, as follows:

p_{j |i} = \frac{\exp (\frac{- {‖x_{i} - x_{j}‖}^{2}}{2 σ^{2}})}{\sum_{k \neq l} \exp (\frac{- {‖x_{k} - x_{l}‖}^{2}}{2 σ^{2}})}

(9)

q_{i j} = \frac{(1 + {‖y_{i} - y_{j}‖)}^{- 1}}{\sum_{k \neq l} (1 + {‖y_{k} - y_{l}‖)}^{- 1}}

(10)

The t-SNE algorithm identifies a set of points

\{y_{1}, y_{2}, \dots, y_{n}\}

that correspond to the projections of each input

x_{i}

in the reduced space represented as

y_{i}

. This method aims to reduce the Kullback–Leibler divergence between the

P

distribution associated with the original input space and the

Q

distribution pertaining to the embedded space by employing the following gradient descent techniques:

\frac{δ C}{δ y_{i}} = 4 \sum_{j} {(p_{i j} - q}_{i j}) ({y_{i} - y}_{j}) (1 + {{‖y_{i} - y_{j}‖}^{2})}^{- 1}

(11)

2.7. Autoencoder (AE)

Neural networks, specifically autoencoders (AE), offer an alternative method for non-linear DR [51]. The AE has a unique setup wherein the output mirrors the input; however, it contains a central bottleneck layer with notably fewer neurons [37]. This necessitates the network to discern a significant low-dimensional interpretation of the data. Research indicates that a basic AE with only a single hidden layer and a linear activation function can produce an embedding comparable to that of a PCA. However, when dealing with intricate non-linear phenomena such as fluid dynamics, deep AEs with non-linear activation functions tend to be more effective than PCA. Therefore, in this study, fully connected deep neural networks, often termed multilayer perceptrons (MLP), were employed to reduce the dimensions. A distinct benefit of autoencoders in comparison over manifold learning techniques is the ease with which data can be reconstructed using the decoding network.

2.8. Experiment Settings

Prior to embarking on the DR process, we preprocessed the data. This involves both min-max normalization and log transformation, aiming to improve learning and ensuring faster convergence. This preliminary phase laid the groundwork for the subsequent model development. Two benchmarks were used for comprehensive evaluation. PCA, a conventional DR technique, was set as the benchmark. Univariate MSA prediction without DR was used as another benchmark. Moreover, in this study, we defined a baseline model as a task to repeat the last input time steps (one year, 12-time steps (months)) as prediction results for MSA streamflow prediction. To underscore the efficiency of the DR algorithms, five prediction layers—linear, dense, CNN, LSTM, and AR-LSTM—were chosen for comparative analysis. The mean squared error (MSE) between the predicted and observed values during the validation phase served as the evaluation metric.

The prediction models were developed using Python 3.6.9. The prediction model was implemented on an NVIDIA^® GeForce^® RTX 2070 SUPER GPU and an Intel^® Core i9-10920X CPU operating at 3.5 GHz with 128 GB RAM.

2.9. Performance Evaluation Metrics

We utilized a broad spectrum of metrics to assess the forecasting capabilities, extending beyond the standard fit statistics mentioned previously. These metrics include the mean absolute error (MAE), root mean square error (RMSE), and symmetric mean absolute percentage error (SMAPE), which serve as bad fit indicators for prediction performance assessment. In addition, we used three goodness-of-fit indices: the correlation coefficient (R), Legates–McCabe’s efficiency index (LM), and Willmott’s index (WI). The mathematical formulations for these evaluation metrics are expressed in Equations (12)–(17).

MAE = \frac{1}{n} \sum_{i = 1}^{n} |Y_{i} - {\hat{Y}}_{i}|

(12)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (Y_{i} - {\hat{Y}}_{i})^{2}}

(13)

SMAPE = \frac{100}{n} \sum \frac{|{\hat{Y}}_{i} - Y_{i}|}{\frac{|Y_{i}| + |{\hat{Y}}_{i}|}{2}}

(14)

R = \frac{\sum_{i = 1}^{n} (Y_{i} - \bar{Y_{i}}) ({\hat{Y}}_{i} - \bar{{\hat{Y}}_{i}})}{\sqrt{\sum_{i = 1}^{n} {(Y_{i} - \bar{Y_{i}})}^{2}} \sqrt{\sum_{i = 1}^{n} {({\hat{Y}}_{i} - \bar{{\hat{Y}}_{i}})}^{2}}}

(15)

LM = 1 - \frac{\sum_{i = 1}^{n} |Y_{i} - \bar{{\hat{Y}}_{i}}|}{\sum_{i = 1}^{n} |Y_{i} - \bar{Y_{i}}|}

(16)

WI = 1 - \frac{\sum_{i = 1}^{n} (Y_{i} - {\hat{Y}}_{i})^{2}}{\sum_{i = 1}^{n} (|Y_{i} - \bar{Y_{i}}| + |{\hat{Y}}_{i} - \bar{Y_{i}}|)^{2}}

(17)

where

Y_{i}

represents the actual values and

{\hat{Y}}_{i}

indicates the forecasted values, while the averages of these actual and forecasted values are denoted by

\bar{Y_{i}}

and

\bar{{\hat{Y}}_{i}}

, respectively.

3. Case Study

The proposed methodology was applied to predict MSA streamflow using two open datasets for selected case studies in various regions of the United States. The monthly historical streamflow data for the selected case studies were obtained from the United States Geological Survey (USGS) website, accessed on 1 August 2023 (https://waterdata.usgs.gov/nwis), as listed in Table 2. Mean, standard deviation (std), and maximum value of monthly streamflow for each caste study are given in Table 2. Other input variables, such as precipitation, solar index, humidity, and temperature, were obtained from the National Aeronautics and Space Administration (NASA) Langley Research Center (LaRC) Prediction of Worldwide Energy Resource (POWER) Project, funded by the NASA Earth Science/Applied Science Program (https://power.larc.nasa.gov, accessed on 12 September 2023), as shown in Table 3. The geographical distribution of the selected case studies in the United States is shown in Figure 1. These case studies represent distinct geographical areas with different annual precipitation levels. For instance, one of the case studies was situated in the northern part of North Dakota, while the other was located in the southern region of Texas. Additionally, there is a case study in the eastern region of California and western region of Pennsylvania. These diverse locations allow the examination of streamflow behavior across a range of hydrological conditions. Each case study encompasses unique catchment characteristics and annual precipitation levels, which contribute to varying average annual runoff volumes. These differences reflect the dynamic nature of streamflow across different climatic and geographical contexts.

To complement the geographical diversity, case studies were carefully selected to test the resilience and flexibility of the DR methods in different hydroclimatic conditions [52]. Priority was given to those sites with accessible, high quality, and sufficiently large historical records, ensuring that the ML models are built on a robust data foundation [53,54]. Uniformity in data quality and features across the case studies was a prerequisite to upholding the soundness of the comparative analysis. Furthermore, the diverse hydroclimate features, represented by a broad range of independent and dependent attributes in the chosen sites, heightened the complexity of the datasets. This intricacy is pivotal in assessing the DR methods’ capability in effectively simplifying data while preserving vital information, enriching the overall depth and robustness of the study, and ensuring a com-prehensive evaluation of the techniques in practical WRM scenarios.

4. Result

This section comprises three stages, as shown in Figure 2. In the first stage, the results of data preprocessing are presented. In the second stage, the results of all the DR techniques are illustrated. In the third stage, various combinations of DR algorithms integrated with five neural network layers, namely, linear, dense, CNN, LSTM, and AR-LSTM, are compared to assess the performance of the DR techniques with two benchmark scenarios.

4.1. First Stage: Data Preprocessing and Analysis

Input data preprocessing is initialized by first performing min–max normalization in the range of 0.001 to 1 to eliminate zero values and render the log transform feasible if required for specific data based on the Shapiro–Wilk test. The min–max normalized data for Case Study I are presented in Figure 3, and the same results for the other three case studies are presented in the Supplementary File.

To validate the normality of the data, the skewness, kurtosis, Shapiro–Wilk p-value, Kolmogorov–Smirnov (KS) statistic, and Anderson-Darling (AD) statistics were investigated. Upon validating the log improvement of all 31 variables using a log transform, the improved data were replaced with their log-transformed counterparts. Consequently, the log-transformation reduced the skewness in the distribution of variables and rendered them more normal. Subsequently, all variables were standardized and prepared as input into the DR algorithms. The results of the Shapiro–Wilk p-Value to test the normality for case study I are presented in Figure 4a,b represents the selected variables that were replaced with their log-transformed counterparts. Same results for the other case studies are given in Supplementary File. Expanding further, ensuring the normality of the dataset was a paramount preprocessing step, considering its significance for various DR algorithms and ML prediction models. Figure 4 vividly demonstrates this procedure. The initial data assessment in Figure 4a highlighted pronounced skewness and kurtosis for several hydroclimate variables. After applying the log transformation, as depicted in Figure 4b, there is a noticeable improvement in the skewness and kurtosis for some variables, making them more symmetric. Although not all variables reached perfect symmetry, the effort to mitigate extreme skewness and kurtosis was evident. Thus, Figure 4 not only emphasizes the importance of data normalization but also highlights the benefits of our chosen method in moving towards this essential prerequisite for our subsequent analyses.

Subsequently, correlation analysis using both Pearson and Spearman rank correlation algorithms between each variable and the in situ streamflow for each case study was performed. Figure 5. elucidates the correlation dynamics among all evaluated variables of case study I, with Figure 5a displaying the Pearson correlation coefficients and Figure 5b illustrating the Spearman rank correlations. Same results for the other case studies are given in Supplementary File. While the color intensity—ranging from deep blue for strong negative correlations to dark red for strong positive ones—indicates the strength and direction of relationships in both matrices, discernible differences between them underscore the varied nature of linear and monotonic relationships. A primary observation from these heatmaps is the evident strong linear and nonlinear interrelationships among numerous variables. This indicates a substantial amount of redundancy in the data. To address this, DR techniques become essential. Such redundancy underscores the need for an efficient DR method to eliminate overlapping information, simplifying the dataset. By doing so, we can ensure that the prediction model is fed with the most pertinent and non-redundant information, enhancing its performance and interpretability. Thus, this analysis not only aids in understanding the relationships among the hydroclimate variables but also emphasizes the importance of data simplification for effective model training.

The cross-correlation results for 24 months lag among all variables with in situ stream-flow data were extracted for case study I, as presented in Figure 6. Same results for the other case studies are given in Supplementary File. This figure provides a comprehensive visualization of the cross-correlation between various variables and the streamflow (log-Q) over lag periods ranging from 1 to 24 months. A discernible pattern emerges: most variables display a significant cross-correlation with log-Q from lag 1 to 24, underscoring their potential utility in predicting the next 12 steps ahead of log-Q. Intriguingly, the cross-correlation patterns appear to be cyclical, with almost every variable showcasing a repeated pattern every 12 months. This implies that the correlation patterns observed from lag 1 to 12 are essentially mirrored in the lag 13 to 24 range, emphasizing a recurrent annual cycle. To streamline the prediction model and mitigate redundancy, only the lags 1 to 12 are selected as inputs after employing a DR algorithm to extract salient time series features. The final subplot vividly captures the autocorrelation of log-Q under the same conditions, suggesting that the preceding 12 months’ data should be used in the model to forecast the subsequent 12 months.

4.2. Second Stage: Informative Feature Synthesis through Dimensionality Reduction

Eight DR algorithms–PCA, KPCA, LLE, LE, t-SNE, MDS, ISOMAP, and AE–were employed to simplify and restructure the input dataset. These DR techniques transform the original data into a compressed representation that retains essential information for predictive modeling, while eliminating noise and redundant details. Determining the optimal number of components in a DR study is essential for improving accuracy. In this study, we aimed to maintain 95% of the variance in the original dataset as the basis for selecting the optimal number of components. Because PCA is a robust and widely used DR method, we used PCA as the sole benchmark to determine the number of components to be used for other DR methods. Moreover, 95% variance of the original datasets was obtained with 11 components in the PCA, which was subsequently used for other DR methods except t-SNE and MDS. In these two methods, the number of components was set to three.

For a visual representation of the results of the DR algorithms, pairwise comparison plots of the generated components are illustrated in Figure 7 for Case Study I, and the same results for the other case studies are reported in the Supplementary File. In these scatter plots, each data point corresponds to an observation in the dataset, and its position is determined by the values of the two components. The color of each point represents the associated normalized streamflow values. These plots enable a comprehensive analysis of the inter-component relationships and their connection to the in situ streamflow. Each row and column in the matrix correspond to a specific component, and each cell in the matrix displays a scatter plot depicting the relationship between the corresponding pair of components.

As illustrated in Figure 7, from a general point of view, AE, as a neural network-based method, has the ability to capture non-linear relationships. In most cases, the scatter plots exhibited complex structures, indicating that the AE components captured intricate patterns. The AE plots differed considerably across the case studies, with certain plots exhibiting tighter clusters, while other plots exhibit a more cloud-like distribution. This trend suggests that the ability of AE to capture non-linear patterns varies depending on the dataset used. t-SNE is designed to retain local structures and creates distinct clusters.

In all the case studies, the t-SNE scatter plots showed distinct clusters or patterns, highlighting their capability to capture non-linear structures. Each case exhibited a unique clustering pattern in the t-SNE plots, reflecting the individual characteristics of each dataset. ISOMAP is a manifold-learning method designed to retain geodesic (or curved) distances. The scatter plots show spread out data points, implying that broader structures in the data were captured using the DR algorithm. The ISOMAP plots varied across the case studies, with some plots exhibiting more concentrated patterns, whereas other plots are more dispersed. This indicates the sensitivity of the method to dataset-specific characteristics. However, the LE and LLE inherently focus on preserving local structures, which renders them beneficial for datasets in which local patterns are crucial. However, their non-linear nature complicates the interpretation of direct features. Overall, all DR methods exhibited variations in patterns across the four case studies. This underscores the importance of understanding the specific characteristics of each dataset and tailoring the DR method accordingly. As a linear DR method, PCA provides consistent patterns across the case studies, which renders it a reliable choice for capturing primary variations. In contrast, non-linear methods, such as AE, MDS, and t-SNE, exhibited more variability, demonstrating their ability to capture complex existing patterns. Overall, although each method has its strengths in data representation, the ease of interpretability varies. Depending on the application, researchers may need to balance the tradeoff between capturing intricate patterns and obtaining clear, actionable insights. The cross-correlation results of the first three components of all the DR in case study I are illustrated in Figure 8. For the other case studies, the results are reported in the Supplementary File.

4.3. Third Stage: Prediction Model Evaluation

At this stage, endogenous features were utilized as inputs for the predictive models alongside historical streamflow records. A comprehensive hyperparameter-tuning process was performed using a grid search to ensure an equitable evaluation of the prediction models. This meticulous approach guaranteed fair assessments for all models. The Supplementary File lists the determined near-optimal values of the hyperparameter configurations for all the models. The cross-validation results for the two benchmarks are presented in Figure 9. This study employed a retrospective input sequence with a length of 12 and batch size of 16. This strategy effectively replicated prevailing annual conditions, enabling the generation of MSA streamflow predictions for the subsequent 12 months. The MSA prediction results (12 steps input and 12 steps output) for the two benchmarks for 1 year for the test sets are shown in Figure 10 and Figure 11. The same results for all models over the four years of test sets for Case study I are provided in the Supplementary File. As shown in Figure 10, using the baseline as the repeat of the previous year yielded better results than using the univariate linear and dense models, indicating the poor univariate performance of these two models, particularly for MSA streamflow prediction. As illustrated in Figure 9, simpler models such as linear and dense models exhibited considerable improvement when PCA was used as the DR technique.

The comparison results of the various ML models with different DR techniques for Case Study I are listed in Table 4. For the other case studies, the results are provided in the Supplementary File. The top 10% values for all performance indices are highlighted in bold compared to the baseline values. As shown in Table 4, the consistent improvement across all case studies indicates the general superiority of DR techniques in enhancing the prediction capability of all types of neural network layers in comparison with the baseline and univariate models. As shown in Table 4, the application of PCA with CNN led to a 22% improvement in terms of MAE. In terms of the RMSE, the MDS-CNN improved the prediction performance by 26%. The KPCA-CNN improved the prediction performance by 12% in terms of SMAPE. Moreover, the PCA-CNN enhanced the prediction performance by approximately three times compared to the baseline. In terms of WI, for both CNN-PCA and CNN-MDS, we observed a 29% improvement in the prediction result and an improvement of more than 1.5 times in terms of R. In Case Study IV, Fitting CNN with components extracted from AE, ISOMAP, KPCA, and PCA was ranked among the top 10% performance improvements. In Case Studies I and II, CNN with KPCA and t-SNE were among the top 10% performance improvements. In Case Study II, PCA and t-SNE improved the performance of dense LSTM and AR-LSTM, respectively. Because the results are similar, the one-size-fits-all approach may overlook the potential benefits of tailored DR with prediction layers. Therefore, we prefer to prioritize the generalization of methods over selecting and reporting the best DR techniques.

The improvement in all models in comparison with the baseline for all case studies is reported in Figure 12. As expected, the baseline model yields an initial benchmark for comparing the performance of the DR techniques with the aim of improving the MSA prediction performance in terms of all metrics. As shown in Figure 12, the performance metrics generally improve as we transition from a simple linear model to more complex models, such as CNN and LSTM. This trend suggests that more sophisticated models can more effectively capture intricate patterns in data. Each DR method has its own metrics, and the efficiency of each method depends on the degree of alignment of the extracted components with the underlying patterns in the data relevant to the target variable. This is a major limitation of this study. Similar trends were observed in Case Study I, which were also observed in other case studies. However, the exact metric values differed based on the unique characteristics of the case study dataset, such as Case Study III, which exhibited a different performance in comparison with the other three case studies. The effectiveness of each DR method varies, reflecting its ability to capture dataset-specific patterns. We can make interpretations similar to those described above for other studies. The key is to identify the DR methods that consistently perform well across all case studies, indicating their broader applicability and robustness.

5. Discussion

From the viewpoint of generalizability and reliability, across the four case studies which represent diverse climate conditions, PCA, t-SNE, and MDS consistently extracted meaningful components. This demonstrates their general applicability across various hydroclimatic datasets. Although PCA captures primary linear patterns, t-SNE and MDS exhibit non-linear relationships. This combined strength ensures that major linear trends and intricate non-linear patterns are captured, thereby making the ensemble of these methods comprehensive. In other words, it not only facilitates a comprehensive understanding of hydroclimate data but also ensures that the ensemble of these methods is holistically informative and versatile. PCA, with its simplicity and linear approach, offers foundational insights, making it an optimal choice for preliminary analyses. In contrast, the non-linear nature of t-SNE, MDS, and AE unveils the complexity and intricacies of hydroclimate datasets, ensuring meticulous scrutiny of every nuanced pattern within the data, which is essential for developing sustainable WRM strategies. The highlighted findings, pertaining to the high cross-correlation of the first three components extracted using PCA, KPCA, LE, and LLE with target streamflow over lags 1 to 24, reflect the practical implications of these techniques in discerning patterns relevant to streamflow predictions. It reinforces the practical relevance of these DR methods in real-world applications, allowing researchers and practitioners to approach hydroclimate forecasting with an augmented level of confidence and precision. The variable correlation patterns exhibited by components extracted using AE accentuate the necessity for meticulous selection and assessment of DR methods based on the specificity of datasets and the objectives of the study.

Across different predictive models (Linear, Dense, CNN, LSTM and AR-LSTM), the components derived from these DR methods consistently enhanced the model performance compared to the baseline or univariate models. This proves their reliability under different modeling scenarios. One of the remarkable findings of this study was the consistent efficacy of these DR methods across different climatic conditions—be it humid, arid, or temperate—highlighting their universal applicability in extracting features with predictive power, thus affirming their robustness and reliability. This uniformity in performance, regardless of the dataset’s origin or climate type, emphasizes the potency of computer-aided models in synthesizing diverse and complex hydroclimate data for precise and efficient WRM. From the viewpoints of simplicity and complexity, PCA, which is a linear method, yields a simpler interpretation, which renders it suitable for initial analyses. In contrast, t-SNE, MDS, and AE, with their non-linear nature, capture complex patterns, ensuring that no intricate relationships are overlooked. In conclusion, the combined strengths of DR make it a superior choice for hydroclimate data analysis and MSA streamflow prediction. Their consistent performance across diverse datasets validates their generalizability and reliability, ensuring that they capture both the broad trends and intricate nuances of the data. For any researcher or practitioner in the field, these DR methods provide a balanced toolkit to approach hydroclimate forecasting with confidence.

In our comprehensive examination of DR techniques for hydroclimatic forecasting, we encountered a multifaceted landscape of both strengths and challenges inherent in these methods. The consistent efficacy of PCA, t-SNE, and MDS in extracting meaningful components underscores their robust versatility and reliability, with PCA offering computational frugality and a clear lens on linear trends, while t-SNE and MDS delve deeper into non-linear intricacies, albeit with increased demands on interpretation and computational resources. However, the nuances of employing AE spotlight a critical balance, illustrating variable success between compressing data and preserving essential predictive nuances, thereby emphasizing that method selection and customization require a dataset-tailored, objective-oriented strategy.

In conducting our research, we recognized inherent limitations predominantly linked to the standardized operational settings of the various DR techniques implemented. By adhering to the recommended default parameters—a decision made to uphold methodological uniformity—we inadvertently may have curtailed the exhaustive explorative potential these methods offer. Our strategy, though systematic in maintaining over 95% of data variance through careful adjustment of component numbers, potentially glosses over the necessity for technique-specific optimizations, crucial for revealing more nuanced aspects of data or boosting predictive performance. Consequently, while our findings are solid within the constraints of the current study, they invite further scrutiny; parameter adjustments aligned more closely with the unique data context could lead to different outcomes, implying that our insights, while valuable, are not conclusively applicable in every scenario.

This complex interplay is further evidenced by the boost in model performance across various predictive frameworks, confirming the DR methods’ practical efficacy in real-world forecasting. Still, this necessitates a nuanced, context-aware application, harmonizing the simplicity of analysis with the uncovering of multifaceted hydroclimatic patterns. By acknowledging these intricacies, researchers and practitioners in WRM are poised to harness these techniques’ strengths while mitigating their limitations, thereby facilitating more accurate, efficient, and context-adapted hydroclimate forecasting. Our reflection on these methodologies and results serves not as a critique but as a springboard for future studies, calling for subsequent studies to adopt a more granular, deliberate calibration of DR settings in harmony with dataset peculiarities and study ambitions. Such refinements could potentially unlock untapped potential within these DR techniques, leading to advancements in hydroclimatic forecasting precision, efficiency, and adaptability, shaping more informed WRM paradigms.

In the realm of computer-aided modeling for diverse scientific applications, our study echoes and diverges from established pathways documented in previous research. The robust applicability of PCA, t-SNE, and MDS documented in our work finds parallels in the broader computational research landscape, as seen in [55,56,57,58], affirming their reliability in data compression and insightful pattern discernment. Nonetheless, our findings introduce a nuanced perspective, emphasizing the sensitivity of these DR methods to diverse data characteristics, a critical insight in the context of variable climate conditions. While prior studies, such as [59], have acknowledged the merits of PCA in data analysis, our research advocates a more integrated approach. We suggest leveraging a combination of DR methods to comprehensively interpret the multi-dimensional nature of hydroclimatic phenomena and their subsequent impact when integrated with various ANNs.

In contrast to the methodological approach in [16], which harnessed 3D-CNN and time-distributed 2D-CNN for dimensionality reduction in satellite-derived geospatial data, our study offers complementary insights. Specifically, the DR methods employed in our research, notably PCA, t-SNE, and MDS, exhibited proficiency in recognizing both linear and non-linear patterns, which are crucial for effective streamflow prediction from complex hydroclimate data. Unlike the specialized structures of 3D-CNN and time-distributed 2D-CNN, which cater to spatial-temporal dynamics, the DR techniques highlighted in our study demonstrate broader adaptability across various hydroclimatic datasets and NN frameworks. This adaptability, evidenced through consistent effectiveness across diverse climatic regions, accentuates the universality and resilience of our selected DR methods. Nonetheless, it is worth acknowledging that the specific encoder configurations used in [16] may offer more targeted insights for scenarios with pronounced geo-spatiotemporal interplay. By broadening the investigative lens to incorporate various DR strategies and NN models, our study enhances understanding of their collective utility in hydroclimate forecasting. Moreover, the performance comparison among linear, dense, CNN, LSTM, and AR-LSTM models in our research provides a layered understanding of how DR methodologies can synergize to improve fore-casting accuracy in hydroclimate contexts. We underline the adaptability and efficacy of DR strategies, especially when judiciously paired with ANNs. This is a testament to their flexibility in addressing diverse scientific inquiries within the computer-aided modeling paradigm. This observation underscores a pivotal theme: the success of DR methods, including AE, is contingent on their alignment with specific dataset characteristics, presenting a fertile avenue for future in-depth explorations.

6. Contribution to Future Studies

Our study charts a pivotal direction for hydroclimatic research, simultaneously laying a firm foundation and sparking new avenues for deeper exploration. By vouching for the versatility of various DR techniques in the intricate world of climatic data, we arm future investigations with a solid, evidence-based toolkit for preliminary analysis. Yet, it is paramount to recognize that the success of these techniques, especially AE, hinges on the specific nuances of the dataset at hand and the overarching research goals. This underscores the imperative for a meticulous, tailored approach in employing DR methodologies, one that resonates profoundly with each dataset’s unique character, potentially elevating the precision and depth of hydroclimatic forecasting.

But our contribution is more than a mere endorsement of DR tools; it is a call to delve deeper, dissect, and discern. We urge the research community to interrogate the conditions under which these techniques excel and where they might falter, seeking a harmonious blend of computational efficiency, clear interpretation, and predictive insight. In essence, this is not just about using DR methods but refining, adapting, and critically examining them in hydroclimatic forecasting. Embracing this detailed, discerning lens promises not only advanced academic rigor but also a leap toward informed, adaptive, and foresighted environmental management.

7. Conclusions

This research emerges at a juncture when the hydrological domain is on the cusp of embracing advanced methodologies to tackle the pervasive challenge of the curse of dimensionality, which is particularly prevalent in MSA streamflow predictions. Although the literature offers glimpses into the potential of non-linear unsupervised DR techniques, their comprehensive application in hydrology remains relatively unexplored. Recognizing this gap, our research embarked on an innovative journey, fusing the potency of eight DR techniques with five different neural network architectures, to optimize the accuracy of MSA monthly streamflow predictions across four distinct case studies in the USA. The introduced hybrid models, encompassing combinations of KPCA/PCA/LLE/LE/t-SNE/MDS/ISOMAP/AE with linear/dense/CNN/LSTM/AR-LSTM, are testaments to our pioneering efforts in integrating cutting-edge DR techniques and mark their debut in the realm of MSA streamflow prediction. This synergy, crafted between DR techniques and the prediction algorithm, not only addresses the curse of dimensionality but also pioneers an enhanced modeling approach for the hydrological sector.

Our study, rooted in innovation and exploration, illuminates a pathway for the future of WRM. By harmonizing the intrinsic strengths of DR with the robust predictive capabilities of LSTM, AR-LSTM, and CNN, we charted a course that promises enhanced accuracy and efficiency in MSA streamflow predictions, marking a significant stride towards the evolution of forecasting techniques in hydrology. This study provides valuable insights into the effectiveness of various DR techniques in enhancing streamflow prediction models. However, the limitations of hyperparameter tuning for each DR technique and the number of component determinations based on PCA benchmarking highlight potential areas for improvement. As a suggestion, the choice of DR method should be guided by the nature of the data and forecasting goals. In specific scenarios, capturing primary linear patterns may suffice, whereas in others, delving into non-linear relationships may be pivotal. Future studies may benefit from a more comprehensive hyperparameter tuning approach for DR methods to ensure that the potential of each method is fully realized. In addition, exploring methods to determine the optimal number of components for each DR technique, rather than relying solely on PCA, may yield more tailored and potentially informative datasets. Despite these limitations, this study offers a strong foundation for understanding the interplay between DR methods and hydroclimate forecasting, thereby setting the stage for further refined analyses.

In conclusion, while our findings bolster confidence in the selected DR methods’ applicability, they also caution against a one-size-fits-all approach. Instead, they advocate for a more nuanced employment, tailored to the unique contours of each research context. By charting this territory, our study serves as a catalyst for refined methodological approaches in future hydroclimatic research, potentially paving the way for breakthroughs in forecasting accuracy and resource management strategies.

Supplementary Materials

The supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/su152215761/s1. Figure S1. The minmax normalized data for (a) case study II, (b) case study III, and (c) case study IV. Figure S2. The results of Shapiro-Walk P-Values test for (a) to check the data normality and (b) after applying log-transformed for case study II. Figure S3. The results of Shapiro-Walk P-Values test for (a) to check the data normality and (b) after applying log-transformed for case study III. Figure S4. The results of Shapiro-Walk P-Values test for (a) to check the data normality and (b) after applying log-transformed for case study IV. Figure S5. The heatmap of Pearson correlation and spearman rank correlation among all variables for (a) Case study II, (b) Case study III, and (c) Case study IV. Figure S6. Cross-correlation between all variables and streamflow for (a) case study II, (b) case study III, and (c) case study IV. Figure S7. Scatter plots for each pair of components generated from (a) PCA, (b) KPCA, (c) LLE, (d) LE, (e) t-SNE, (f) MDS, (g) ISOMAP and (h) AE for Case study II. Figure S8. Scatter plots for each pair of components generated from (a) PCA, (b) KPCA, (c) LLE, (d) LE, (e) t-SNE, (f) MDS, (g) ISOMAP and (h) AE for Case study III. Figure S9. Scatter plots for each pair of components generated from (a) PCA, (b) KPCA, (c) LLE, (d) LE, (e) t-SNE, (f) MDS, (g) ISOMAP and (h) AE for case study IV. Figure S9. Scatter plots for each pair of components generated from (a) PCA, (b) KPCA, (c) LLE, (d) LE, (e) t-SNE, (f) MDS, (g) ISOMAP and (h) AE for case study IV. Figure S10. 12-steps ahead prediction results of all models over the four years of test sets for Case study I. Table S1. Near-optimal values of the hyperparameter configurations for all the models. Table S2. 12-step ahead monthly streamflow prediction performance of different algorithms in case study II. Table S3. 12-step ahead monthly streamflow prediction performance of different algorithms in case study III. Table S4. 12-step ahead monthly streamflow prediction performance of different algorithms in case study IV.

Author Contributions

F.G.: Conceptualization, Methodology, Investigation, Software, Validation, Formal analysis, data curation, writing—original draft, writing—review and editing, and visualization. A.S.T.C.: Conceptualization, Methodology, Investigation, Software, Validation, Formal analysis, data curation. D.K.: Supervision, Validation, Review and editing, Resources, Funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by (1) Korea Environment Industry & Technology Institute (KEITI) through the Water Management Program for Drought, funded by the Korea Ministry of Environment (MOE) (RS-2023-0023194) and (2) Korea Institute of Energy Technology Evaluation and Planning (KETEP) and the Ministry of Trade, Industry & Energy (MOTIE) of the Republic of Korea (20224000000260).

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Abdelkader, M.; Temimi, M.; Ouarda, T.B.M.J. Assessing the National Water Model’s Streamflow Estimates Using a Multi-Decade Retrospective Dataset across the Contiguous United States. Water 2023, 15, 2319. [Google Scholar] [CrossRef]
Choi, J.; Kim, U.; Kim, S. Ecohydrologic Model with Satellite-Based Data for Predicting Streamflow in Ungauged Basins. Sci. Total Environ. 2023, 903, 166617. [Google Scholar] [CrossRef] [PubMed]
Ikram, R.M.A.; Ewees, A.A.; Parmar, K.S.; Yaseen, Z.M.; Shahid, S.; Kisi, O. The Viability of Extended Marine Predators Algorithm-Based Artificial Neural Networks for Streamflow Prediction. Appl. Soft Comput. 2022, 131, 109739. [Google Scholar] [CrossRef]
Ghimire, S.; Yaseen, Z.M.; Farooque, A.A.; Deo, R.C.; Zhang, J.; Tao, X. Streamflow Prediction Using an Integrated Methodology Based on Convolutional Neural Network and Long Short-Term Memory Networks. Sci. Rep. 2021, 11, 17497. [Google Scholar] [CrossRef] [PubMed]
Kumar, V.; Kedam, N.; Sharma, K.V.; Khedher, K.M.; Alluqmani, A.E. A Comparison of Machine Learning Models for Predicting Rainfall in Urban Metropolitan Cities. Sustainability 2023, 15, 13724. [Google Scholar] [CrossRef]
Fang, W.; Zhou, J.; Jia, B.J.; Gu, L.; Xu, Z. Study on the Evolution Law of Performance of Mid- to Long-Term Streamflow Forecasting Based on Data-Driven Models. Sustain. Cities Soc. 2023, 88, 104277. [Google Scholar] [CrossRef]
Nifa, K.; Boudhar, A.; Ouatiki, H.; Elyoussfi, H.; Bargam, B.; Chehbouni, A. Deep Learning Approach with LSTM for Daily Streamflow Prediction in a Semi-Arid Area: A Case Study of Oum Er-Rbia River Basin, Morocco. Water 2023, 15, 262. [Google Scholar] [CrossRef]
Bakhshi Ostadkalayeh, F.; Moradi, S.; Asadi, A.; Moghaddam Nia, A.; Taheri, S. Performance Improvement of LSTM-Based Deep Learning Model for Streamflow Forecasting Using Kalman Filtering. Water Resour. Manag. 2023, 37, 3111–3127. [Google Scholar] [CrossRef]
Adnan, R.M.; Mostafa, R.R.; Kisi, O.; Yaseen, Z.M.; Shahid, S.; Zounemat-Kermani, M. Improving Streamflow Prediction Using a New Hybrid ELM Model Combined with Hybrid Particle Swarm Optimization and Grey Wolf Optimization. Knowl.-Based Syst. 2021, 230, 107379. [Google Scholar] [CrossRef]
Bhattarai, A.; Qadir, D.; Sunusi, A.M.; Getachew, B.; Mallah, A.R. Dynamic Sliding Window-Based Long Short-Term Memory Model Development for Pan Evaporation Forecasting. Knowl.-Based Eng. Sci. 2023, 4, 37–54. [Google Scholar]
Ahmadianfar, I.; Khatam, B.; Demir, V.; Karatay, K.; Heddam, U.S.; Al-Areeq, A.M.; Abba, S.I.; Leong, M.; Institute, T.P.; Halder, B.; et al. Daily Scale Streamflow Forecasting Based-Hybrid Gradient Boosting Machine Learning Model. Res. Sq. 2023. preprint. [Google Scholar] [CrossRef]
Yaseen, Z.M.; Sulaiman, S.O.; Deo, R.C.; Chau, K.W. An Enhanced Extreme Learning Machine Model for River Flow Forecasting: State-of-the-Art, Practical Applications in Water Resource Engineering Area and Future Research Direction. J. Hydrol. 2019, 569, 387–408. [Google Scholar] [CrossRef]
Yaseen, Z.M.; El-shafie, A.; Jaafar, O.; Afan, H.A.; Sayl, K.N. Artificial Intelligence Based Models for Stream-Flow Forecasting: 2000–2015. J. Hydrol. 2015, 530, 829–844. [Google Scholar] [CrossRef]
Akbarian, M.; Saghafian, B.; Golian, S. Monthly Streamflow Forecasting by Machine Learning Methods Using Dynamic Weather Prediction Model Outputs over Iran. J. Hydrol. 2023, 620, 129480. [Google Scholar] [CrossRef]
Liu, G.; Tang, Z.; Qin, H.; Liu, S.; Shen, Q.; Qu, Y.; Zhou, J. Short-Term Runoff Prediction Using Deep Learning Multi-Dimensional Ensemble Method. J. Hydrol. 2022, 609, 127762. [Google Scholar] [CrossRef]
Ghobadi, F.; Kang, D. Improving Long-Term Streamflow Prediction in a Poorly Gauged Basin Using Geo-Spatiotemporal Mesoscale Data and Attention-Based Deep Learning: A Comparative Study. J. Hydrol. 2022, 615, 128608. [Google Scholar] [CrossRef]
Ghobadi, F.; Kang, D. Application of Machine Learning in Water Resources Management: A Systematic Literature Review. Water 2023, 15, 620. [Google Scholar] [CrossRef]
Won, J.; Seo, J.; Lee, J.; Choi, J.; Park, Y.; Lee, O.; Kim, S. Streamflow Predictions in Ungauged Basins Using Recurrent Neural Network and Decision Tree-Based Algorithm: Application to the Southern Region of the Korean Peninsula. Water 2023, 15, 2485. [Google Scholar] [CrossRef]
Sujay Raghavendra, N.; Beyaztas, B.H.; Bokde, N.D.; Armanuos, A.M. On the evaluation of the gradient tree boosting model for groundwater level forecasting. Knowl.-Based Eng. Sci. 2020, 1, 48–57. [Google Scholar]
Alsaeed, B.S.; Hunt, D.V.L.; Sharifi, S. Sustainable Water Resources Management Assessment Frameworks (SWRM-AF) for Arid and Semi-Arid Regions: A Systematic Review. Sustainability 2022, 14, 15293. [Google Scholar] [CrossRef]
Kumar, V.; Kedam, N.; Sharma, K.V.; Mehta, D.J.; Caloiero, T. Advanced Machine Learning Techniques to Improve Hydrological Prediction: A Comparative Analysis of Streamflow Prediction Models. Water 2023, 15, 2572. [Google Scholar] [CrossRef]
Yaseen, Z.M.; Kisi, O.; Demir, V. Enhancing Long-Term Streamflow Forecasting and Predicting Using Periodicity Data Component: Application of Artificial Intelligence. Water Resour. Manag. 2016, 30, 4125–4151. [Google Scholar] [CrossRef]
Fayer, G.; Lima, L.; Miranda, F.; Santos, J.; Campos, R.; Bignoto, V.; Andreade, M.; Moraes, M.; Ribeiro, C.; Capriles, P.; et al. A Temporal Fusion Transformer Deep Learning Model for Long-Term Streamflow Forecasting: A Case Study in the Funil Reservoir, Southeast Brazil. Knowl.-Based Eng. Sci. 2023, 4, 73–88. [Google Scholar]
Cantor, A.; Kiparsky, M.; Hubbard, S.S.; Kennedy, R.; Pecharroman, L.C.; Guivetchi, K.; Darling, G.; McCready, C.; Bales, R. Making a Water Data System Responsive to Information Needs of Decision Makers. Front. Clim. 2021, 3, 761444. [Google Scholar] [CrossRef]
Jia, W.; Sun, M.; Lian, J.; Hou, S. Feature Dimensionality Reduction: A Review. Complex. Intell. Syst. 2022, 8, 2663–2693. [Google Scholar] [CrossRef]
Tayerani Charmchi, A.S.; Ifaei, P.; Yoo, C.K. Smart Supply-Side Management of Optimal Hydro Reservoirs Using the Water/Energy Nexus Concept: A Hydropower Pinch Analysis. Appl. Energy 2021, 281, 116136. [Google Scholar] [CrossRef]
Ghobadi, F.; Kang, D. Multi-Step Ahead Probabilistic Forecasting of Daily Streamflow Using Bayesian Deep Learning: A Multiple Case Study. Water 2022, 14, 3672. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J. Big Data 2021, 8, 1–74. [Google Scholar] [CrossRef]
Zebari, R.; Abdulazeez, A.; Zeebaree, D.; Zebari, D.; Saeed, J. A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction. J. Appl. Sci. Technol. Trends 2020, 1, 56–70. [Google Scholar] [CrossRef]
Xu, T.; Liang, F. Machine Learning for Hydrologic Sciences: An Introductory Overview. Wiley Interdiscip. Rev. Water 2021, 8, e1533. [Google Scholar] [CrossRef]
Marukatat, S. Tutorial on PCA and Approximate PCA and Approximate Kernel PCA. Artif. Intell. Rev. 2023, 56, 5445–5477. [Google Scholar] [CrossRef]
Schölkopf, B.; Smola, A.; Müller, K.R. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Comput. 1998, 10, 1299–1319. [Google Scholar] [CrossRef]
Michael, A.A.; Cox, T.F.C. Multidimensional Scaling. In SpringerReference; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Tenenbaum, J.B.; De Silva, V.; Langford, J.C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef]
Roweis, S.T.; Saul, L.K. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed]
Van Der Maaten, L.; Hinton, G. Visualizing Data Using T-SNE. J. Mach. Learn. Res. 2008, 9, 11. [Google Scholar]
Csala, H.; Dawson, S.T.M.; Arzani, A. Comparing Different Nonlinear Dimensionality Reduction Techniques for Data-Driven Unsteady Fluid Flow Modeling. Phys. Fluids 2022, 34, 117119. [Google Scholar] [CrossRef]
Cunningham, P. Dimension Reduction. In Machine Learning Techniques for Multimedia; Springer: Berlin/Heidelberg, Germany, 2008; pp. 91–112. [Google Scholar] [CrossRef]
Xiao, Y.; Zou, C.; Chi, H.; Fang, R. Boosted GRU Model for Short-Term Forecasting of Wind Power with Feature-Weighted Principal Component Analysis. Energy 2023, 267, 126503. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, B.; Pan, G.; Zhao, Y. A Novel Hybrid Model Based on VMD-WT and PCA-BP-RBF Neural Network for Short-Term Wind Speed Forecasting. Energy Convers. Manag. 2019, 195, 180–197. [Google Scholar] [CrossRef]
Lan, H.; Zhang, C.; Hong, Y.Y.; He, Y.; Wen, S. Day-Ahead Spatiotemporal Solar Irradiation Forecasting Using Frequency-Based Hybrid Principal Component Analysis and Neural Network. Appl. Energy 2019, 247, 389–402. [Google Scholar] [CrossRef]
González-Vidal, A.; Jiménez, F.; Gómez-Skarmeta, A.F. A Methodology for Energy Multivariate Time Series Forecasting in Smart Buildings Based on Feature Selection. Energy Build. 2019, 196, 71–82. [Google Scholar] [CrossRef]
Zhang, J.; Chen, X.; Khan, A.; Zhang, Y.; Kuang, X.; Liang, X.; Taccari, M.L.; Nuttall, J. Daily Runoff Forecasting by Deep Recursive Neural Network. J. Hydrol. 2021, 596, 126067. [Google Scholar] [CrossRef]
Chang, L.C.; Liou, J.Y.; Chang, F.J. Spatial-Temporal Flood Inundation Nowcasts by Fusing Machine Learning Methods and Principal Component Analysis. J. Hydrol. 2022, 612, 128086. [Google Scholar] [CrossRef]
Haddad, K.; Rahman, A. Dimensionality Reduction for Regional Flood Frequency Analysis: Linear versus Nonlinear Methods. Hydrol. Process. 2023, 37, 14864. [Google Scholar] [CrossRef]
Van Der Maaten, L.J.P.; Postma, E.O.; Van Den Herik, H.J. Dimensionality Reduction: A Comparative Review. J. Mach. Learn. Res. 2009, 10, 13. [Google Scholar] [CrossRef]
Anowar, F.; Sadaoui, S.; Selim, B. Conceptual and Empirical Comparison of Dimensionality Reduction Algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Comput. Sci. Rev. 2021, 40, 100378. [Google Scholar] [CrossRef]
Yang, J.; Frangi, A.F.; Yang, J.Y.; Zhang, D.; Jin, Z. KPCA plus LDA: A Complete Kernel Fisher Discriminant Framework for Feature Extraction and Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 230–244. [Google Scholar] [CrossRef]
Hou, Y.; Zhang, P.; Xu, X.; Zhang, X.; Li, W. Nonlinear Dimensionality Reduction by Locally Linear Inlaying. IEEE Trans. Neural Netw. 2009, 20, 300–315. [Google Scholar] [CrossRef] [PubMed]
Belkin, M.; Niyogi, P. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Comput. 2003, 15, 1373–1396. [Google Scholar] [CrossRef]
Baldi, P. Autoencoders, Unsupervised Learning, and Deep Architectures. ICML Unsupervised Transf. Learn. 2012, 2, 1–27. [Google Scholar] [CrossRef]
Ougahi, J.H.; Saeed, S.; Hasan, K. Assessment of Hydro-Climatic Variables and Its Impact on River Flow Regime in the Sub-Basins of the Upper Indus Basin. Earth Syst. Environ. 2023, 7, 307–320. [Google Scholar] [CrossRef]
Priestley, M.; O’Donnell, F.; Simperl, E. A Survey of Data Quality Requirements That Matter in ML Development Pipelines. ACM J. Data Inf. Qual. 2023, 15, 1–39. [Google Scholar] [CrossRef]
Gong, Y.; Liu, G.; Xue, Y.; Li, R.; Meng, L. A Survey on Dataset Quality in Machine Learning. Inf. Softw. Technol. 2023, 162, 107268. [Google Scholar] [CrossRef]
Pareek, J.; Jacob, J. Data Compression and Visualization Using Pca and T-Sne. In Advances in Information Communication Technology and Computing; Lecture Notes in Network and Systems; Springer: Singapore, 2021; Volume 135, pp. 327–337. [Google Scholar] [CrossRef]
Wang, L.; Tian, T.; Xu, H.; Tong, H. Short-Term Power Load Forecasting Model Based on t-SNE Dimension Reduction Visualization Analysis, VMD and LSSVM Improved with Chaotic Sparrow Search Algorithm Optimization. J. Electr. Eng. Technol. 2022, 17, 2675–2691. [Google Scholar] [CrossRef]
Hou, C.K.J.; Behdinan, K. Dimensionality Reduction in Surrogate Modeling: A Review of Combined Methods. Data Sci. Eng. 2022, 7, 402–427. [Google Scholar] [CrossRef]
Azeem, M.A.; Dey, P.; Dev, S. A Multidimensionality Reduction Approach to Rainfall Prediction. In Proceedings of the 2023 Photonics & Electromagnetics Research Symposium (PIERS), Prague, Czech Republic, 3–6 July 2023; pp. 499–508. [Google Scholar] [CrossRef]
Wu, Y.; Chen, Y.; Tian, Y. Incorporating Empirical Orthogonal Function Analysis into Machine Learning Models for Streamflow Prediction. Sustainability 2022, 14, 6612. [Google Scholar] [CrossRef]

Figure 1. Location of four selected case study basins.

Figure 2. The sequential steps of the study, encompassing data collection, preprocessing, feature synthesis, and the assessment of the impacts of feature extraction on various neural networks for MSA streamflow prediction.

Figure 3. The min-max normalized data for case study I.

Figure 4. The results of Shapiro-Walk p-Value (a) to check the data normality and (b) after applying log-transformed.

Figure 5. The heatmap of (a) Pearson correlation and (b) spearman rank correlation among all variables.

Figure 6. Cross-correlation between all variables and streamflow for case study I.

Figure 7. Scatter plots for each pair of components generated from (a) PCA, (b) KPCA, (c) LLE, (d) LE, (e) t-SNE, (f) MDS, (g) ISOMAP and (h) AE for case study I.

Figure 8. Cross-correlation between first three extracted components and normalized streamflow for case study I.

Figure 9. The results of cross validation for (a) univariate benchmark models and (b) integrated with PCA for case study I.

Figure 10. 12-steps ahead prediction results of (a) baseline, (b) univariate-linear, (c) univariate-dense, (d) univariate-CNN, (e) univariate-LSTM, and (f) univariate-AR-LSTM for case study I.

Figure 11. 12-steps ahead prediction results of (a) PCA-linear, (b) PCA-dense, (c) PCA-CNN, (d) PCA-LSTM, and (e) PCA-AR-LSTM model for case study I.

Figure 12. Depicts the performance enhancement of various models, illustrated by dark blue (positive improvement) and red (negative improvement) bars across four case studies: (a) I, (b) II, (c) III, and (d) IV.

Table 1. Different types of kernel function with their explanation.

Kernel Type	Explanation	Equation
Linear	Suitable for linearly separable data	$K (x_{i}, x_{j}) = {x_{i}}^{T} x_{j} + c o n s t a n t$
Polynomial	captures the resemblance between data Points within a feature space using polynomial expressions	$K (x_{i}, x_{j}) = {{({α x}_{i}}^{T} x_{j} + c o n s t a n t)}^{d e g r e e}$
Radial Basis Function (RBF)	Employed for non-linearly separable data, also known as the Gaussian kernel	$K (x_{i}, x_{j}) = \exp (- \frac{{\|x_{i} - x_{j}\|}^{2}}{2 σ^{2}})$ .
Hyperbolic tangent	Particularly utilized in neural networks	$K (x_{i}, x_{j}) = {t a n h (x_{i}}^{T} x_{j} + c o n s t a n t)$

Table 2. Details of the selected case studies and descriptive information of monthly streamflow records.

Case Study	Site No	G-Name	Lon. (°)	Lat. (°)	Period	Mean	Std	Max
I	01429000	Wayne County, Pennsylvania	75°19′38″	41°35′14″	1960–2023	111.61	94.35	620.5
II	06354000	Morton County, North Dakota	100°56′04″	46°22′34″	1934–2022	245.94	686.16	10,070
III	11292700	Tuolumne County, California	120°02′01″	38°14′50″	1956–2022	85.64	171.77	1076
IV	08147000	Lampasas County, Texas	98°33′51″	31°13′04″	1930–2023	879.31	2101.56	32,210

Table 3. Details of the meteorological data for four selected case studies.

Meteorological Time Series Data	Abbreviation	Unit	Case Study
			I		II		III		IV
			Mean	Std	Mean	Std	Mean	Std	Mean	Std
All Sky Surface Longwave Downward Irradiance	ALLSKY_SFC_LW_DWN	W/m²	310.5597	45.52792	298.1952	51.0501	300.2963	25.55118	355.4661	42.47993
All Sky Surface Shortwave Downward Irradiance	ALLSKY_SFC_SW_DWN	MJ/m²/day	13.38689	5.58819	14.50265	6.750604	18.90632	7.891191	17.80108	5.456197
All Sky Surface Albedo	ALLSKY_SRF_ALB	dimensionless	0.138728	0.045916	0.245066	0.141124	0.132522	0.024763	0.146535	0.021243
Cloud Amount	CLOUD_AMT	%	66.99309	8.475946	60.93566	10.23926	40.45961	19.25167	49.64851	12.63631
Clear Sky Surface Shortwave Downward Irradiance	CLRSKY_SFC_SW_DWN	MJ/m²/day	19.36167	7.087067	18.77336	8.21689	21.7068	7.465475	21.96447	5.493803
Profile Soil Moisture	GWETPROF		0.64689	0.13771	0.462175	0.051597	0.606037	0.146993	0.586606	0.051551
Root Zone Soil Wetness	GWETROOT		0.661545	0.14349	0.456768	0.049415	0.620569	0.162329	0.570285	0.048943
Surface Soil Wetness	GWETTOP		0.685142	0.129718	0.473882	0.069647	0.549939	0.234585	0.548577	0.095198
Precipitation Corrected	PRECTOTCORR	mm/day	3.055528	1.450598	1.302419	1.162961	2.068476	2.500011	2.118821	1.589853
Precipitation Corrected Sum	PRECTOTCORR_SUM	mm	87.37781	44.86141	33.83108	35.52685	57.37321	75.84044	59.76175	48.81806
Surface Pressure	PS	kPa	96.18429	0.241269	94.8137	0.209754	84.78671	0.164394	96.69657	0.249391
Specific Humidity at 2 Meters	QV2M	g/kg	6.532663	3.420653	4.999004	2.901456	4.546707	0.991138	9.60122	3.609926
Relative Humidity at 2 Meters	RH2M	%	81.76632	7.105356	67.33667	12.41442	55.80063	19.08122	64.71547	8.794498
Temperature at 2 Meters	T2M	C	7.851118	9.450467	6.866077	11.73689	10.17986	7.373067	18.99909	7.735366
Temperature at 2 Meters Maximum	T2M_MAX	C	4.3675	8.247092	0.04502	8.744725	−0.78945	3.042255	10.96974	6.521172
Temperature at 2 Meters Minimum	T2M_MIN	C	6.109329	8.829071	3.45565	10.20423	4.695061	4.675147	14.98445	7.018397
Temperature at 2 Meters Range	T2M_RANGE	C	21.45144	8.520928	23.72067	11.72145	24.28561	7.927044	32.78012	5.893456
Frost Point at 2 Meters	T2MDEW	C	−4.88157	10.07092	−7.47968	13.25339	−2.50045	7.121715	5.834492	9.539317
Wet Bulb Temperature at 2 Meters	T2MWET	C	26.3324	4.262249	31.20026	5.107138	26.78665	3.726498	26.94565	5.311394
Top-Of-Atmosphere Shortwave Downward Irradiance	TOA_SW_DWN	MJ/m²/day	27.79309	10.2011	25.90081	11.26651	28.85816	9.526711	31.10004	7.858187
Earth Skin Temperature	TS	C	7.673354	9.471474	7.154797	12.16414	9.117602	7.642648	19.65417	8.288876
Wind Direction at 10 Meters	WD10M	Degrees	266.9556	35.97723	242.339	87.34749	172.676	59.31284	189.6561	54.66162
Wind Direction at 2 Meters	WD2M	Degrees	249.514	81.85339	244.9057	87.11628	173.9148	75.77939	190.7448	56.06245
Wind Speed at 10 Meters	WS10M	m/s	2.332744	0.612945	5.101016	0.534256	2.445976	0.185898	4.533455	0.574069
Wind Speed at 10 Meters Maximum	WS10M_MAX	m/s	5.549045	1.578291	13.46252	2.138776	6.136768	1.289327	10.62559	1.783986
Wind Speed at 10 Meters Minimum	WS10M_MIN	m/s	0.229228	0.158001	0.309106	0.181758	0.140915	0.083949	0.3575	0.238769
Wind Speed at 10 Meters Range	WS10M_RANGE	m/s	5.319492	1.531181	13.15339	2.147119	5.995589	1.302368	10.26827	1.797659
Wind Speed at 2 Meters	WS2M	m/s	0.259085	0.303912	3.586931	0.440373	0.688008	0.054077	3.102053	0.402283
Wind Speed at 2 Meters Maximum	WS2M_MAX	m/s	1.187907	0.457205	9.826809	1.685652	1.597602	0.336021	7.552419	1.232379
Wind Speed at 2 Meters Minimum	WS2M_MIN	m/s	0.025935	0.047334	0.217358	0.125412	0.043089	0.025413	0.249776	0.150923
Wind Speed at 2 Meters Range	WS2M_RANGE	m/s	1.161809	0.421313	9.609451	1.68786	1.554289	0.338407	7.302602	1.242539

Table 4. 12-step ahead monthly streamflow prediction performance of different algorithms in Case Study I.

Types	Methods	MAE	RMSE	SMAPE	LM	WI	R
Baseline	Univariate	* 0.699	* 0.952	* 1.007	* −0.083	* 0.568	* 0.222
Linear	Univariate	0.694	0.806	1.522	−0.076	0.447	0.223
	PCA	0.553	0.712	0.975	0.142	0.708	0.550
	KPCA	0.550	0.710	0.948	0.148	0.703	0.551
	LLE	0.585	0.729	1.086	0.092	0.666	0.511
	LE	0.646	0.762	1.388	−0.001	0.522	0.406
	t-SNE	0.560	0.718	1.005	0.132	0.684	0.527
	MDS	0.558	0.720	1.002	0.134	0.690	0.527
	ISOMAP	0.567	0.716	1.035	0.121	0.682	0.539
	AE	0.550	0.720	0.959	0.148	0.703	0.534
Dense	Univariate	0.691	0.805	1.476	−0.071	0.460	0.239
	PCA	0.561	0.727	0.967	0.130	0.706	0.541
	KPCA	0.563	0.723	0.970	0.127	0.700	0.543
	LLE	0.566	0.730	0.989	0.122	0.693	0.525
	LE	0.583	0.743	1.045	0.096	0.664	0.492
	t-SNE	0.575	0.727	1.034	0.109	0.681	0.530
	MDS	0.569	0.728	1.010	0.117	0.692	0.534
	ISOMAP	0.557	0.710	0.989	0.136	0.696	0.552
	AE	0.555	0.731	0.955	0.139	0.693	0.523
CNN	Univariate	0.607	0.787	1.086	0.059	0.644	0.422
	PCA	0.542	0.702	0.894	0.159	0.735	0.600
	KPCA	0.553	0.732	0.881	0.143	0.727	0.571
	LLE	0.565	0.751	0.926	0.124	0.720	0.545
	LE	0.578	0.767	0.958	0.103	0.716	0.535
	t-SNE	0.552	0.721	0.888	0.145	0.727	0.579
	MDS	0.547	0.701	0.910	0.152	0.735	0.600
	ISOMAP	0.547	0.706	0.906	0.151	0.727	0.590
	AE	0.549	0.717	0.905	0.149	0.726	0.576
LSTM	Univariate	0.590	0.771	1.045	0.086	0.646	0.436
	PCA	0.576	0.760	0.947	0.106	0.694	0.504
	KPCA	0.568	0.746	0.930	0.119	0.718	0.544
	LLE	0.572	0.753	0.942	0.114	0.706	0.526
	LE	0.572	0.762	0.989	0.113	0.665	0.463
	t-SNE	0.591	0.753	1.006	0.083	0.680	0.500
	MDS	0.569	0.742	0.960	0.117	0.690	0.516
	ISOMAP	0.593	0.765	1.015	0.081	0.649	0.451
	AE	0.585	0.768	0.964	0.093	0.683	0.494
AR-LSTM	Univariate	0.571	0.783	0.932	0.115	0.699	0.507
	PCA	0.616	0.807	0.987	0.045	0.681	0.508
	KPCA	0.601	0.830	0.898	0.067	0.707	0.540
	LLE	0.626	0.876	0.927	0.030	0.663	0.464
	LE	0.581	0.768	1.002	0.100	0.692	0.499
	t-SNE	0.580	0.752	0.940	0.100	0.688	0.508
	MDS	0.576	0.756	0.957	0.106	0.686	0.504
	ISOMAP	0.706	0.899	1.183	−0.095	0.570	0.334
	AE	0.586	0.793	0.926	0.091	0.703	0.531
Improvement		0.22	0.26	0.12	2.91	0.29	1.70

* The reference values used to calculate the improvements observed in the corresponding metrics.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ghobadi, F.; Tayerani Charmchi, A.S.; Kang, D. Feature Extraction from Satellite-Derived Hydroclimate Data: Assessing Impacts on Various Neural Networks for Multi-Step Ahead Streamflow Prediction. Sustainability 2023, 15, 15761. https://doi.org/10.3390/su152215761

AMA Style

Ghobadi F, Tayerani Charmchi AS, Kang D. Feature Extraction from Satellite-Derived Hydroclimate Data: Assessing Impacts on Various Neural Networks for Multi-Step Ahead Streamflow Prediction. Sustainability. 2023; 15(22):15761. https://doi.org/10.3390/su152215761

Chicago/Turabian Style

Ghobadi, Fatemeh, Amir Saman Tayerani Charmchi, and Doosun Kang. 2023. "Feature Extraction from Satellite-Derived Hydroclimate Data: Assessing Impacts on Various Neural Networks for Multi-Step Ahead Streamflow Prediction" Sustainability 15, no. 22: 15761. https://doi.org/10.3390/su152215761

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Feature Extraction from Satellite-Derived Hydroclimate Data: Assessing Impacts on Various Neural Networks for Multi-Step Ahead Streamflow Prediction

Abstract

1. Introduction

1.1. Research Background and Motivation

1.2. Literature Review

2. Materials and Methods

2.1. Multidimensional Scaling (MDS)

2.2. Kernel Principal Component Analysis (KPCA)

2.3. Local Linear Embedding (LLE)

2.4. Isometric Mapping (ISOMAP)

2.5. Laplacian Eigenmap (LE)

2.6. t-Distributed Stochastic Neighbor Embedding (t-SNE)

2.7. Autoencoder (AE)

2.8. Experiment Settings

2.9. Performance Evaluation Metrics

3. Case Study

4. Result

4.1. First Stage: Data Preprocessing and Analysis

4.2. Second Stage: Informative Feature Synthesis through Dimensionality Reduction

4.3. Third Stage: Prediction Model Evaluation

5. Discussion

6. Contribution to Future Studies

7. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI