Sparse Sliding-Window Kernel Recursive Least-Squares Channel Prediction for Fast Time-Varying MIMO Systems

Ai, Xingxing; Zhao, Jiayi; Zhang, Hongtao; Sun, Yong

doi:10.3390/s22166248

Open AccessArticle

Sparse Sliding-Window Kernel Recursive Least-Squares Channel Prediction for Fast Time-Varying MIMO Systems

by

Xingxing Ai

¹,

Jiayi Zhao

²,

Hongtao Zhang

² and

Yong Sun

^2,*

¹

ZTE Corporation, Algorithm Department, Wireless Product R&D Institute, Wireless Product Operation Division, Shenzhen 518057, China

²

Key Laboratory of Universal Wireless Communications, Ministry of Education of China, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(16), 6248; https://doi.org/10.3390/s22166248

Submission received: 10 June 2022 / Revised: 29 July 2022 / Accepted: 13 August 2022 / Published: 19 August 2022

(This article belongs to the Special Issue Resource Allocation for Cooperative Communications)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate channel state information (CSI) is important for MIMO systems, especially in a high-speed scenario, fast time-varying CSI tends to be out of date, and a change in CSI shows complex nonlinearities. The kernel recursive least-squares (KRLS) algorithm, which offers an attractive framework to deal with nonlinear problems, can be used in predicting nonlinear time-varying CSI. However, the network structure of the traditional KRLS algorithm grows as the training sample size increases, resulting in insufficient storage space and increasing computation when dealing with incoming data, which limits the online prediction of the KRLS algorithm. This paper proposed a new sparse sliding-window KRLS (SSW-KRLS) algorithm where a candidate discard set is selected through correlation analysis between the mapping vectors in the kernel Hilbert spaces of the new input sample and the existing samples in the kernel dictionary; then, the discarded sample is determined in combination with its corresponding output to achieve dynamic sample updates. Specifically, the proposed SSW-KRLS algorithm maintains the size of the kernel dictionary within the sample budget requires a fixed amount of memory and computation per time step, incorporates regularization, and achieves online prediction. Moreover, in order to sufficiently track the strongly changeable dynamic characteristics, a forgetting factor is considered in the proposed algorithm. Numerical simulations demonstrate that, under a realistic channel model of 3GPP in a rich scattering environment, our proposed algorithm achieved superior performance in terms of both predictive accuracy and kernel dictionary size than that of the ALD-KRLS algorithm. Our proposed SSW-KRLS algorithm with

M = 90

achieved 2 dB NMSE less than that of the ALD-KRLS algorithm with

v = 0.001

, while the kernel dictionary was about 17% smaller when the speed of the mobile user was 120 km/h.

Keywords:

channel prediction; time-varying channels; MIMO system; kernel methods; recursive least squares

1. Introduction

Multiantenna technology can fully use spatial dimension resources and dramatically improve the capacity of wireless communication systems without increasing transmission power and bandwidth [1]. Meanwhile, beamforming technology [2,3] is widely used to reduce the interference between cochannel users with cooperative transmission and reception since it can compensate channel fading and distortion caused by the multipath effect. Base stations optimize the allocation of radio resources through reasonable precoding, rendering the desired signal and interference more orthogonal provided that CSI is known at the base station. Thus, the acquisition of CSI is very important in the cooperation of transmission and reception [4]. However, due to the dynamics of the channel, especially when the terminal is moving at a high speed, the acquisition of CSI is a formidable problem in MIMO systems.

In TDD systems, the user sends a sounding reference signal (SRS), and the base station performs channel estimation algorithms such as LS [5] and MMSE [6]. Then, the obtained CSI is used for downlink beamforming to realize cooperative signal processing. The coherence time of wireless channels is the time duration after which CSI is considered to be outdated. When the terminal is moving at a high speed, the Doppler frequency shift grows, and the time variability of the channel is severe, which leads to the shortening of channel coherence time. The measured uplink channel CSI cannot represent the real channel state of downlink slots, resulting in the mismatch between the downlink beamforming designed according to the measured CSI and the actual channel. In [1], with a typical CSI delay of 4 milliseconds, the user terminal speed at 30 km/h led to as much as 50% performance reduction versus in the low-mobility scenario at 3 km/h.

In order to overcome the performance degradation caused by severe time-varying channels, [7] proposed a tractable user-centric CSI model and a robust beamforming design by taking deterministic equivalents [8] into account. However, when the user terminal is at high speed, the channel shows nonstationary characteristics, and the statistical characteristics of the channel also change with time, which cannot be used for the beamforming of high-speed time-varying channels.

Another approach is to use a channel prediction algorithm [9,10,11,12,13,14,15,16] to obtain more accurate CSI for beamforming. The kernel method is widely used in channel prediction algorithms due to its ability to track nonlinear channels, and its adaptation to time-varying channels [17,18,19,20,21,22]. However, algorithms based on kernel methods face the problem in online prediction that the networks structure grows as the size of the training samples grows in time. Though researchers have proposed sparseness methods to limit the size of samples by setting prerequisites for new samples added into the dictionary, the size of the kernel dictionary cannot be precisely controlled. Therefore, this paper proposes a new channel algorithm based on the kernel method that maintains the size of the kernel dictionary within a fixed budget while precisely tracking the fast time-varying dynamic characteristics.

1.1. Related Work

State-of-the-art channel fading prediction algorithms were reviewed in [9]. Parametric radio channel-based methods [10] consider channel changes faster than multipath parameters, and the estimation of these parameters can help in the extrapolation of the channel into future. However, the effective time of static multipath parameters is inversely proportional to the terminal moving speed, rendering the channel prediction based on radio parameters not appropriate for high-speed scenarios. Autoregressive (AR) model-based methods [11,13] do not explicitly model physical scatterings, considering the time-varying channel as a stochastic wide-sense stationary (WSS) process and the temporal autocorrelation function is used for prediction. Nevertheless, they are not capable of predicting ephemeral variations in non-wide-sense-stationary (NWSS) channels due to their linear correlation assumption. When treating NWSS channels, many studies attempted to use machine learning in channel prediction. The authors in [14] proposed a backpropagation (BP) framework regarding channel prediction for backscatter communication networks that considers both spatial and frequency diversity. In [15], the authors developed a machine-learning method for predicting a mobile communication channel on the basis of a specific type of convolutional neural network. Although these algorithms can achieve good performance, they need to build neural networks, require a large number of samples for training, and the complexities of these algorithms are high. The channel prediction method based on a support vector machine [16,23] uses Mercer’s theorem [24] to map the channel sample space to the high-dimensional space, and performs linear regression in the high-dimensional space to solve the problem of tracking nonlinear channels well. The kernel recursive least-squares (KRLS) [17,18,19,20,21,22] algorithm is a nonlinear version of the recursive least-squares (RLS) algorithm. It can not only solve nonlinear problems, but also adaptively iterate the model parameters.

In order to solve this problem and render the online kernel method feasible, researchers proposed various sparseness methods or criteria, such as approximate linear dependency (ALD) [17], the novelty criterion (NC) [20], the surprise criterion (SC) [21], and the coherence criterion (CC) [22]. On the basis of these sparseness methods or criteria, only new input samples that meet the prerequisites are added to the dictionary. However, these sparse methods cannot precisely control the size of the kernel dictionary, which motivated us to introduce a sliding window to control the size of the kernel dictionary. Recently, some KRLS-based algorithms with an inserted forgetting factor have achieved better performance than that of QKRLS algorithms [25,26] and ALD-KRLS, which motivated us to insert a forgetting factor into the proposed SSW-KRLS algorithm. This paper proposes a new sparse sliding-window KRLS algorithm where a candidate discard set is selected through correlation analysis between the mapping vectors in the kernel Hilbert spaces of the new input sample and the existing samples in the kernel dictionary; then, the discarded sample is determined in combination with its corresponding output to achieve dynamic sample updates.

1.2. Contributions

The main contributions of this paper are summarized as follows:

We propose a novel sparse sliding-window KRLS algorithm. To precisely control the size of samples, we introduce a sample budget as a size restriction. When the dictionary was smaller than the sample budget, we directly added the new sample to the dictionary. Otherwise, we chose the best sample to discard according to our proposed new criterion.
To differentiate the sample value collected at different times, we introduced a forgetting matrix. By setting different forgetting values for samples collected at different times, we quantified the time value of the samples. The older sample had a smaller forgetting value, which means that its time value was smaller. In this way, we considered both the correlation of samples and the time value when discarding old samples.
Regarding our new method for discarding old samples, we set a candidate set where we decided which sample to discard. The candidate set was obtained by adding the samples that had larger kernel functions of the new sample than a threshold and were highly correlated with the new sample. Then, we conducted a weighted estimation of the output value of these samples. We decided which sample to discard on the basis of the deviation between its output and the estimated value.

2. System Model

We considered two typical user movement scenarios: urban road where the moving speed of users is 60 km/h, and a highway where the moving speed of users is 120 km/h. As shown in Figure 1, in TDD systems, the user sends a SRS, and the base station runs channel estimation algorithms such as LS and MMSE. The channel matrix is first estimated in frequency-domain; then, the channel frequency response is transformed into the time-domain by IDFT. The noise is evenly distributed in the whole time domain and is easy to be eliminated using a raised cosine window. Therefore, the effect of noise is very small compared to the user’s high mobility. Due to the channel reciprocity in TDD systems, uplink channel CSI can be directly used in the design of downlink beamforming to realize cooperative signal processing. However, nonstationary fast fading characteristics of mobile environments bring challenges to multi-input multi-output (MIMO) communications.

When the terminal is moving at a high speed, the Doppler frequency shift grows, and the time variability of the channel is severe. The measured uplink channel CSI cannot represent the real channel state of the downlink slots, resulting in performance degradation, and a mismatch between the measured CSI and the actual channel. As shown in Figure 2, during the SRS reporting cycle, the user can be regarded as moving in a fixed direction at a fixed speed. In two adjacent time slots, due to the short moving distance of the user, the amplitude and phase of the direct and scattered components have a certain correlation. When the user speed is low, an AR-based channel prediction method can achieve good performance by utilizing the linear correlation between adjacent time slots. However, when the user is moving at a high speed, the channel correlation between adjacent time slots presents nonlinear characteristics. Kernel methods are needed to exploit the nonlinear correlation characteristics of the channel.

The user sends SRS in special time slots, and the BS performs the channel measurement and estimation to obtain the channel matrix. As shown in Figure 3, the channel matrix measured by BS in the i-th SRS period is assumed to be

H_{i} \in C^{N_{r} \times N_{t}}

.

N_{t}

and

N_{r}

represent the number of antennas of the BS and the mobile UE, respectively. The real and imaginary parts are separately processed in the subsequent algorithm, and H represents a real matrix in the later representation. The prediction order means that each channel matrix is related to the channel matrix measured in the previous-order SRS periods, so the input vector of the prediction system is

u_{i} = [H_{i}, H_{i + 1}, \dots, H_{i + o r d e r - 1}] .

(1)

There are some complex and unknown dependencies between

u_{i}

and

H_{i}

:

H_{i + o r d e r - 1} =

f (u_{i}) = f ([H_{i}, H_{i + 1}, \dots, H_{i + o r d e r - 1}])

that need to be exploited by kernel methods.

3. Traditional KRLS Algorithm

In this section, we introduce the traditional KRLS algorithm and several extensions to the KRLS algorithm.

3.1. Traditional KRLS Algorithm

Assume a set of ordered input-output pairs

D = \{u_{i}, y_{i}\} |_{i = 1}^{m}

, where m is the total number of samples,

u_{i} \in R^{m}

are m-dimensional input vectors and

y_{i} \in R

is the output. We call D a dictionary that records the input-output pairs collected before the i-th time slot. First, according to the Mercer theorem, we can adopt a nonlinear function

φ (\cdot)

to transform data

u_{i} \in R^{m}

into a high-dimensional feature space:

φ (\cdot) : R^{m} \to R^{c}

. The corresponding kernel function is

κ (u, v) = 〈φ (u), φ (v)〉

. We need to minimize the cost function:

J = \sum_{j = 1}^{i} {|y_{i} - w_{i} φ_{j}|}^{2} = {∥Φ {_{i}}^{T} w_{i} - y_{i}∥}^{2},

(2)

where

Φ_{i} = [φ (u_{1}), . . ., φ (u_{i})] = [φ_{1}, \dots, φ_{i}]

is a high-dimensional matrix. By minimizing the cost function as (2), we can obtain weight

w_{i}

=

Φ_{i}^{†} y_{i}

, where

Φ_{i}^{†}

is the pseudoinverse of

Φ_{i}

.

Φ_{i}^{†}

cannot be solved directly because kernel function

κ (u, v)

is a high-dimensional mapping, and the exact mapping function is unknown. To avoid overfitting when the samples are small, we used L2 regularization, so the cost function is reformulated as:

J = {∥Φ {_{i}}^{T} w_{i} - y_{i}∥}^{2} + λ {∥w_{i}∥}^{2} .

(3)

By letting

\frac{\partial J}{\partial w_{i}} = 0

, we can obtain that

w_{i} = Φ_{i} {[λ I + Φ_{i}^{T} Φ_{i}]}^{- 1} y_{i} = Φ_{i} α (i) .

(4)

By substituting (4) into (3), the cost function can be reformulated as:

J = {∥Φ {_{i}}^{T} Φ_{i} α (i) - y_{i}∥}^{2} + λ {∥w_{i}∥}^{2} = {∥K_{i} α (i) - y_{i}∥}^{2} + λ {∥w_{i}∥}^{2},

(5)

where

K_{i}

is the kernel matrix and the element located at the i-th row and j-th column of

K_{i}

is

K_{i} (i, j) = κ (u_{i}, u_{j})

. The problem is to solve

α (i) = {[λ I + Φ_{i}^{T} Φ_{i}]}^{- 1} y_{i} = {[λ I + K_{i}]}^{- 1} y_{i} = Q (i) y_{i},

(6)

Then, the inverse of

Q (i)

can be obtained as:

Q {(i)}^{- 1} = λ I + K_{i} = [\begin{matrix} Q {(i - 1)}^{- 1} & k_{i} \\ {k_{i}}^{T} & λ + κ (u_{i}, u_{i}) \end{matrix}],

(7)

where

k_{i} = {Φ_{i - 1}}^{T} φ_{i}

. Thus,

Q (i)

can be solved using the inversion of the partitioned matrix.

Q (i) = r {(i)}^{- 1} [\begin{matrix} Q (i - 1) r (i) + z (i) z {(i)}^{T} & - z (i) \\ - z {(i)}^{T} & 1 \end{matrix}],

(8)

where

z (i) = Q (i - 1) k_{i}

and

r (i) = λ + {φ_{i}}^{T} φ_{i} - z {(i)}^{- 1} φ_{i}

.

The traditional KRLS algorithm is summarized in Algorithm 1.

Algorithm 1 Traditional KRLS algorithm.

1:: Initialize $Q (1) = {(λ + k (u_{1}, u_{1}))}^{- 1}$ and $α (1) = Q (1) y_{1}$ .
2:: Iterate for i > 1: $k_{i} = {Φ_{i - 1}}^{T} φ_{i}$ , $z (i) = Q (i - 1) k_{i}$ , $r (i) = λ + {φ_{i}}^{T} φ_{i} - z {(i)}^{- 1} φ_{i}$ ,

$Q (i) = r {(i)}^{- 1} [\begin{matrix} Q (i - 1) r (i) + z (i) z {(i)}^{T} - z (i) \\ - z {(i)}^{T} 1 \end{matrix}]$

By relying on the kernel trick, the traditional KRLS algorithm can deal with nonlinear problems by nonlinearly transforming data into a high-dimensional reproducing kernel Hilbert space, which is similar to other techniques such as support vector machines (SVMs). Compared to SVMs, it avoids large-scale high-dimensional computing through iterative computations. However, the traditional KRLS algorithm grows linearly with the number of processed data, rendering growing complexities for each consecutive update if no additional measures are taken. In order to render online kernel algorithms feasible, growth is typically slowed down by approximately representing the solution using only a subset of bases that are considered to be relevant according to a chosen criterion. In our proposed SSW-KRLS algorithm, we took similar operations to avoid an increase in computation, which is discussed in detail in Section 4.

3.2. Extensions to KRLS Algorithm

The kernel recursive least-squares method is one of the most efficient online kernel methods, and it achieves good performance in nonlinear fitting and prediction. However, the bottleneck problem is that the network structure grows with the training samples, which leads to insufficient memory and computational complexities when processing continuously incoming signals. In order to solve this problem and render the online kernel method feasible, researchers have proposed various sparseness methods or criteria, such as approximate linear dependency (ALD), the novelty criterion (NC), the surprise criterion (SC) and the coherence criterion (CC). On the basis of these sparseness methods or criteria, only new input samples that meet the prerequisites are used as training samples. Thus, the growth of network structure is effectively slowed down. Table 1 shows several common sparseness criteria.

The performance of the above methods in filtering samples largely depends on the selected threshold. As time goes by, the number of samples increases slowly and lastly stabilizes, while the update of model parameters tends to be slow. This is not conducive to updating time-varying channels. Furthermore, when the user moves into a new environment, the outdated samples are reserved, which is not conducive to channel prediction.

An effective way to keep the dictionary updated while precisely controlling its size is using the sliding window. A simple implementation is discarding the oldest sample directly each time a new sample is collected, but it lacks the judgement of correlation between samples. Sparseness simply based on time information is unreliable and may result in unstable prediction performance when the window is small. In order to solve this problem, we propose a new algorithm, SSW-KRLS, where we take both the time value and correlation between samples into consideration.

4. Proposed SSW-KRLS Algorithm

Assume that there is a set of input–output pairs

D = \{u_{i}, y_{i}\} |_{i = 1}^{L}

where L is the total number of samples.

u_{i} \in R^{m}

are m-dimensional input vectors, and

y_{i} \in R

is the output. In the traditional KRLS algorithm, in each time slot, when a new sample comes, the size dictionary increases, and this leads to an increase in computation and memory requirements. To solve this problem, we used a sliding-window approach to keep the size of the kernel dictionary within a fixed budget. Our criterion for discarding old samples was based on the correlation between existing samples in the kernel dictionary and the new sample. Moreover, we introduced a forgetting factor to exponentially weigh older data by scaling them, so as to track the dynamic characteristics of the channel.

In our proposed SSW-KRLS algorithm, we solved the following least-squares cost function:

min_{w} \sum_{j = 1}^{i} β^{i - j} {(y_{j} - w^{T} φ_{j})}^{2} + λ B (i) w^{T} w,

(9)

where

β

is the forgetting factor, and i is the iteration number,

y_{i} = {(y_{1}, . . ., y_{i})}^{T}

is the output vector, w is the weight vector,

B (i) = d i a g (β^{L - 1}, β^{L - 2}, \dots, 1)

is the forgetting matrix,

φ_{j} = φ (u_{j})

is the transformation of

u_{j}

, and

λ

is the regularization parameter. The optimal

w^{*}

is solved:

w^{*} = {(λ B (i) + Φ_{i} B (i) Φ_{i}^{T})}^{- 1} Φ_{i} B (i) y_{i} .

(10)

We reformulate the equation:

w^{*} = Φ_{i} {[λ B (i) + B (i) K_{i}]}^{- 1} \bar{y_{i}},

(11)

where

\bar{y_{i}} = B (i) y_{i}

is the exponentially weighted input signal.

The solutions for

Q (i) = {[λ B (i) + B (i) K_{i}]}^{- 1}

are different under two cases in the i-th iteration: one case is that the size of the dictionary increased, and the other is that the size of the dictionary remained unchanged. Cases I and II are discussed in Section 4.2 and Section 4.3, respectively. In Section 4.1, we introduce how to update our dictionary when a new sample comes. Our methods are different depending on whether the size of the dictionary reaches the fixed sample budget, which is denoted by M. In particular, when the dictionary was not full, we discarded an old sample on the basis of the correlation analysis between the new and existing samples in the dictionary in order to slow down the increase in dictionary size. In the case when the size of the dictionary was full, we set a candidate discard set which contains the samples highly correlated with the new sample, and determine which sample to discard on the basis of their corresponding output. The whole process of our proposed algorithm is shown in Figure 4.

4.1. How to Optimally Discard an Old Sample

The cosine value is usually adopted for judging the correlation of two vectors. In the KRLS algorithm, on the other hand, the cosine value is calculated in the kernel Hilbert spaces. Suppose

u_{i}

is a new sample, and

u_{j}

is an existing sample in the dictionary. Let

k (u_{j})

denote the kernel vector of

u_{j} \in D

.

k (u_{j}) = {k_{j n}}_{n \neq i, j}

, where

k_{j n} = κ (u_{j}, u_{n})

. To measure the correlation between the existing and new samples, we calculated the cosine value of

k (u_{i})

and

k (u_{j})

for

\forall u_{j} \in D

, as

cos 〈k (u_{i}), k (u_{j})〉 = \frac{k (u_{i}) k (u_{j})^{T}}{|k (u_{i})| |k (u_{j})|}

. When the size of the dictionary did not reach the budget, we found a sample with highest correlation with the new sample. Existing sample

u_{j}

that had the highest cosine value was the most probable to be discarded, where

\underset{j}{j = arg max} cos 〈k (u_{i}), k (u_{j})〉

. We set a threshold

τ

and discarded sample

u_{j}

if

cos 〈k (u_{i}), k (u_{j})〉 \geq τ

. The updated dictionary is:

D (i) = \{\begin{matrix} [D (i - 1) \ u_{j}, u_{i}], i f max_{j} cos 〈k (u_{i}), k (u_{j})〉 \geq τ \\ [D (i - 1), u_{i}], i f max_{j} cos 〈k (u_{i}), k (u_{j})〉 < τ \end{matrix}

(12)

In another case, when the size of the dictionary reaches the fixed budget, one sample must be discarded from the kernel dictionary in each time slot. In our strategy, we first set a candidate discard set on the basis of the correlation with the new sample, and then determined the optimal discarded sample according to its output value.

Candidate discard set S was composed of the samples that had high correlation with the new sample. Specifically, an existing sample in dictionary

u_{j} \in D

was added into S if

κ (u_{i}, u_{j}) > ε

, where

u_{i}

is a new sample and

ε

is a threshold. Among the samples in S, the one with the smallest value for prediction may be one that has the most similar information with the new sample or the one that is untypical with little probability to occur. We determined the discarded sample in combination with its corresponding output as following.

Suppose that the candidate discard set is

S = \{u_{1}, u_{2}, \dots, u_{n}\}

. The weighted average of the output values can be obtained as

y = \sum_{j = 1}^{n} w_{j} y_{j}

where

w_{j} = \frac{κ (u_{i}, u_{j})}{\sum_{j = 1}^{n} κ (u_{i}, u_{j})}

(13)

and n is the number of samples in S.

The deviation between the output value of

u_{j} \in S

and the weighted average output value is

e_{j} = |y_{j} - y|

. The sample with maximal deviation

e_{max}

is not typical and may have occurred with a small probability, while the one with minimal deviation had similar information to that of the new sample. Suppose that the sample with the maximal deviation is

u_{j_{m a x}}

, and the sample with the minimal deviation is

u_{j_{m i n}}

. We chose one of these two samples to discard according to the production of

e_{max}

and

e_{min}

. If the production of

e_{max}

and

e_{min}

was larger than threshold

τ

, this indicated that

e_{max}

was large enough, so we discarded

u_{j_{m a x}}

; otherwise,

e_{min}

was small enough, and we discarded

u_{j_{m i n}}

. The updated dictionary is:

D (i) = \{\begin{matrix} [D (i - 1) \ u_{j_{m a x}} u_{i}], i f e_{max} e_{min} \geq τ \\ [D (i - 1) \ u_{j_{m i n}} u_{i}], i f e_{max} e_{min} < τ . \end{matrix}

(14)

4.2. Case I: The Size of $D (i)$ Is Changed

As mentioned before, the prediction process depends on whether the size of the dictionary has increased or not. In Case I, when the size of dictionary increased, we could obtain from (11) that:

\{\begin{matrix} w_{i} = Φ_{i} α (i) \\ α (i) = Q (i) {\bar{y}}_{i} \\ Q (i) = {[λ B (i) + B (i) K_{i}]}^{- 1} . \end{matrix}

(15)

With the combination of

B (i)

and

y_{i}

, we could obtain exponentially weighted input signal

{\bar{y}}_{i}

. We can find that

B (i) = [\begin{matrix} β B (i - 1) & 0 \\ 0^{T} & 1 \end{matrix}]

(16)

K_{i} = [\begin{matrix} K_{i - 1} & h (i) \\ h {(i)}^{T} & κ (u_{i}, u_{i}) \end{matrix}]

(17)

{\bar{y}}_{i} = [\begin{matrix} β {\bar{y}}_{i - 1} \\ {\bar{y}}_{i} \end{matrix}],

(18)

where

h (i) = {[κ (u_{1}, u_{i}), κ (u_{2}, u_{i}), \dots, κ (u_{i - 1}, u_{i})]}^{T}

. Thus

B (i) K_{i} = [\begin{matrix} β B (i - 1) K_{i - 1} & β B (i - 1) h (i) \\ h {(i)}^{T} & κ (u_{i}, u_{i}) \end{matrix}]

(19)

By substituting (19) into (15), we can obtain that

Q (i) = {[\begin{matrix} β Q {(i - 1)}^{- 1} & β B (i - 1) h (i) \\ h {(i)}^{T} & κ (u_{i}, u_{i}) + λ \end{matrix}]}^{- 1}

(20)

With partitioned matrix inversion, we can obtain

Q (i)

in recursive form:

Q (i) = r {(i)}^{- 1} [\begin{matrix} β^{- 1} r (i) Q (i - 1) + β^{- 1} z_{B} (i) z {(i)}^{T} & - z_{B} (i) \\ - β^{- 1} z {(i)}^{T} & 1 \end{matrix}],

(21)

where

\{\begin{matrix} z_{B} (i) = Q (i - 1) B (i - 1) h (i) \\ r (i) = κ (u_{i}, u_{i}) + λ - h {(i)}^{T} z_{B} (i) \\ z (i) = Q {(i - 1)}^{T} h (i) \end{matrix}

(22)

α (i) = [\begin{matrix} α (i - 1) - z_{B} (i) r {(i)}^{- 1} e (i) \\ r {(i)}^{- 1} e (i) \end{matrix}],

(23)

and

e (i) = y_{i} - h {(i)}^{T} α (i - 1)

is the prediction error in the

i t h

time slot.

4.3. Case II: The Size of $D (i)$ Is Unchanged

In the case when the size of the dictionary did not change, the information of the discarded sample

u_{j^{*}}

in

Q (i - 1)

needed to be deleted. The information of

u_{j^{*}}

lay in the

j^{*} t h

column and

j^{*} t h

row of

Q (i - 1)

. In order to not influence the update of the matrix, we moved the

j^{*} t h

column and

j^{*} t h

row of

Q (i - 1)

into the first row and the first column, and obtained

\hat{Q} (i - 1)

. Correspondingly, we applied the same transformation to

K_{i - 1}

and obtained

\hat{K} (i - 1)

. For

Q (i - 1) = {[λ B (i - 1) + B (i - 1) K_{i - 1}]}^{- 1}

; after the movement, the

j t h

column that meets

j < j^{*}

should be multiplied by

β

.

Suppose that the matrix after removing the first column and first row of

\hat{Q} (i - 1)

and

\hat{K} (i - 1)

is

\tilde{Q} (i - 1)

and

\tilde{K} (i - 1)

, respectively. We can obtain its inversion

\tilde{Q} {(i - 1)}^{- 1}

according to Appendix B,

\hat{Q} (i - 1) = [\begin{matrix} e & f^{T} \\ l & G \end{matrix}]

(24)

\tilde{Q} {(i - 1)}^{- 1} = G - l f^{T} / e .

(25)

New matrix

Q (i)

can be formulated into a partitioned matrix:

Q (i) = {[\begin{matrix} β \tilde{Q} {(i - 1)}^{- 1} & β B (i - 1) K_{i} \\ k {(i)}^{T} & κ (u_{i}, u_{i}) + λ \end{matrix}]}^{- 1} = {[\begin{matrix} A & b \\ p^{T} & κ (u_{i}, u_{i}) + λ \end{matrix}]}^{- 1} .

(26)

Then,

Q (i)

can be obtained using the partitioned matrix:

Q (i) = {[λ β^{i} I (i) + B (i) K_{i}]}^{- 1} = [\begin{matrix} A^{- 1} (I + b p^{T} A^{- 1} g) & - A^{- 1} b g \\ - p^{T} A^{- 1} g & g \end{matrix}],

(27)

where

A^{- 1} = β^{- 1} \tilde{K} {(i - 1)}^{- 1} B {(i - 1)}^{- 1}

,

g = (κ (u_{i}, u_{i}) + λ - p^{T} A^{- 1} {b)}^{- 1}

.

Then, the weight coefficient is updated:

α (i) = Q (i) {\bar{y}}_{i} = Q (i) B (i) y_{i},

(28)

where

y_{i}

is composed of the output value of the samples

y_{i} = {[y_{1}, y_{2}, \dots, y_{i}]}^{T}

. Lastly, we can obtain the prediction value for the next time slot as

{\hat{y}}_{i + 1} = α (i) k {(i)}^{T}

.

5. Performance Evaluation

Based on the analysis above, we present algorithmic steps as Algorithm Section 5 and in this section we show the simulation results of our proposed SSW-KRLS algorithmcompare its performance to that of the ALD-KRLS algorithm as a baseline. The basic simulation parameters are listed in Table 2. We adopted a 3D urban macro scenario, and considered a typical setting of 3 kHz with 30 kHz of subcarrier spacing. We considered 20 MHz of bandwidth that contained 51 resource partitions. Our adopted channel model was a CDL-A channel model, and the DL precoder was RZF.

Algorithm 2 Proposed SSW-KRLS algorithm.

1:: Initialize $Q (1) = {(λ + κ (u_{1}, u_{1}))}^{- 1}$ , $B (1) = [1]$ , $α (1) = Q (1) y_{1}$ , $D (1) = [u_{1}]$ .
2:: Step 1: Iterate for $i > 1$ : judging the number of samples in $D (i)$ , which is L. If $L < M$ , perform Step 2; otherwise, perform Step 3.
3:: Step 2: For each sample $u_{j}$ in $D (i - 1)$ , compute $cos 〈k (u_{i}), k (u_{j})〉$ .
Find $\underset{j}{j^{*} = arg max} cos 〈k (u_{i}), k (u_{j})〉$ .
If $max_{j} cos 〈k (u_{i}), k (u_{j})〉 > τ$ , discard $u_{j *}$ . Then, add new sample $u_{i}$ into the dictionary. Turn to Step 4.
4:: Step 3: Construct candidate discard set S. Suppose $S = \{u_{1}, u_{2}, \dots, u_{n}\}$ . Then, calculate the output of the samples.
For each sample $u_{j}$ , calculate $e_{j} = |y_{j} - y|$ . Find $j_{m a x} = \underset{j}{arg max} e_{j}$ and $j_{m i n} = \underset{j}{arg min} e_{j}$ .
If $e_{min} e_{max} > τ$ , $D (i) = [D (i - 1) \ u_{j_{m a x}}, u_{i}]$ .
If $e_{min} e_{max} \leq τ$ , $D (i) = [D (i - 1) \ u_{j_{m i n}}, u_{i}]$ .
Turn to Step 4.
5:: Step 4: If $D (i)$ is larger than $D (i - 1)$ , perform Step 5; otherwise, perform Step 6.
6:: Step 5: Calculate $Q (i)$ according to (21) and then calculate intermediate matrix $z_{B} (i)$ , $r (i)$ , $z (i)$ according to (22). Calculate $α (i)$ according to (23). The prediction value for the next time slot is ${\hat{y}}_{i + 1} = α (i) k {(i)}^{T}$ .
7:: Step 6: For $Q (i - 1)$ moving the $j^{*} t h$ column and $j^{*} t h$ row into the first column and the first row, and obtain $\tilde{Q} (i - 1)$ ; calculate $\tilde{Q} {(i - 1)}^{- 1}$ by (24) and (25).
Update $Q (i)$ according to (26) and (27). Then, the prediction value is obtained on the basis of (28).

Figure 5 shows the normalized mean squared error (NMSE) performance of different algorithms. We show the performance of the ALD-KRLS and SSW-KRLS algorithms with the 60 and 120 km/h velocity levels for all UEs. When the UE speed was 60 km/h, the prediction algorithms was more accurate than with a UE speed of 120 km/h. Regarding the performance of the SSW-KRLS algorithm, we set different sample budgets:

M = 30, 90, 150

. The algorithm performed better with a higher sample budget. The SSW-KRLS algorithm with sample budget

M = 30

performed better than the ALD-KRLS algorithm with

v = 0.001

, and the SSW-KRLS algorithm with sample budget

M = 90

greatly outperformed the ALD-KRLS algorithm. However, the performance was insignificantly improved when changing the sample budget from 90 to 150.

Figure 6 shows the kernel dictionary size with 400 iterations. The kernel dictionary size for all algorithms first grew and then slowed down; the size for SSW-KRLS remained unchanged. The SSW-KRLS algorithm with sample budget

M = 90

outperformed the ALD-KRLS algorithm with

v = 0.001

, while the former used fewer samples. This indicates the superiority of our proposed SSW-KRLS algorithm.

Figure 7 and Figure 8 show the mean rate of mobile users under different algorithms at the speed of 60 and 120 km/h, respectively. Comparing the figure at two speed levels shows that the user with the lower speed had a higher mean rate. In addition, the performance could be enhanced greatly with the use of the channel prediction algorithm. Particularly, our proposed SSW-KRLS algorithm with sample budgets

M = 150

and

M = 90

achieved better performance than that of the ALD-KRLS algorithm with

v = 0.0001

; for our proposed SSW-KRLS algorithm, a higher sample budget brought about better performance. Moreover, users with more antennas showed better performance under any circumstances.

6. Conclusions

This paper proposed a new sparse sliding-window KRLS algorithm where a candidate discard set is selected through correlation analysis between the mapping vectors in kernel Hilbert spaces of the new input sample and the existing samples in the kernel dictionary. It then determines the discarded sample in combination with its corresponding output to achieve dynamic sample updates. Specifically, the proposed SSW-KRLS algorithm, which maintains the size of the kernel dictionary within the sample budget, requires a fixed amount of memory and computation per time step, incorporates regularization, and achieves online predictions. Moreover, in order to sufficiently track strongly changeable dynamic characteristics, a forgetting factor was considered in the proposed algorithm. Numerical simulations demonstrated that, under a realistic channel model of 3GPP in a rich scattering environment, our proposed algorithm achieved superior performance in terms of both predictive accuracy and kernel dictionary size than that of the ALD-KRLS algorithm. The NMSE for the channel prediction of the SSW-KRLS algorithm with

M = 90

was about 2 dB lower than that of the ALD-KRLS algorithm with

v = 0.001

, while the kernel dictionary was 17% smaller.

Author Contributions

Investigation, X.A. and J.Z.; methodology, X.A.; project administration, X.A.; software, X.A.; resources, H.Z.; writing—original draft preparation, J.Z.; writing—review and editing, H.Z.; funding acquisition, X.A. and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the ZTE Corporation Research Program.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

For a non-singular matrix A, if there is a new column and a new row added to the matrix and get E as as followings:

E = [\begin{matrix} A & b \\ b^{T} & c \end{matrix}] .

(A1)

Suppose the inversion of E is formulated as:

E^{- 1} = [\begin{matrix} D & e \\ e^{T} & f \end{matrix}] .

(A2)

Then the

E^{- 1}

can be calculated by solving:

\{\begin{matrix} A D + b e^{T} = I \\ A e + b f = 0 \\ b^{T} e + c f = 1 \end{matrix},

(A3)

where

E^{- 1}

can be obtained as:

E^{- 1} = [\begin{matrix} A^{- 1} (I + b b^{T} A^{- 1 H} f) & - A^{- 1} b f \\ - {(A^{- 1} b)}^{T} f & f \end{matrix}],

(A4)

where

f = {(c - b^{T} A^{- 1} b)}^{- 1}

.

Appendix B

For a non-singular matrix E, if the first column and the first row are removed from the matrix and get C:

E = [\begin{matrix} a & b^{T} \\ b & C \end{matrix}] .

(A5)

Suppose the inversion of E is formulated as:

E^{- 1} = [\begin{matrix} d & e^{T} \\ e & F \end{matrix}] .

(A6)

Then the

C^{- 1}

can be calculated by solving:

\{\begin{matrix} b d + C e = 0 \\ b e^{T} + C F = I' \end{matrix}

(A7)

where

C^{- 1}

can be obtained as

C^{- 1} = F - e e^{T} / d

.

References

Li, M.; Collings, I.B.; Hanly, S.V.; Liu, C.; Whiting, P. Multicell Coordinated Scheduling with Multiuser Zero-Forcing Beamforming. IEEE Trans. Wirel. Commun. 2016, 15, 827–842. [Google Scholar] [CrossRef]
Yin, H.; Wang, H.; Liu, Y.; Gesbert, D. Addressing the Curse of Mobility in Massive MIMO with Prony-Based Angular-Delay Domain Channel Predictions. IEEE J. Sel. Areas Commun. 2020, 38, 2903–2917. [Google Scholar] [CrossRef]
Li, X.; Jin, S.; Suraweera, H.A.; Hou, J.; Gao, X. Statistical 3-D Beamforming for Large-Scale MIMO Downlink Systems over Rician Fading Channels. IEEE Trans. Commun. 2016, 64, 1529–1543. [Google Scholar] [CrossRef]
Sapavath, N.N.; Rawat, D.B.; Song, M. Machine Learning for RF Slicing Using CSI Prediction in Software Defined Large-Scale MIMO Wireless Networks. IEEE Trans. Netw. Sci. Eng. 2020, 7, 2137–2144. [Google Scholar] [CrossRef]
Huang, C.; Liu, L.; Yuen, C.; Sun, S. Iterative Channel Estimation Using LSE and Sparse Message Passing for MmWave MIMO Systems. IEEE Trans. Signal Process. 2019, 67, 245–259. [Google Scholar] [CrossRef]
Bellili, F.; Sohrabi, F.; Yu, W. Generalized Approximate Message Passing for Massive MIMO mmWave Channel Estimation With Laplacian Prior. IEEE Trans. Commun. 2019, 67, 3205–3219. [Google Scholar] [CrossRef]
Wu, D.; Zhang, H. Tractable Modelling and Robust Coordinated Beamforming Design with Partially Accurate CSI. IEEE Wirel. Commun. Lett. 2021, 10, 2384–2387. [Google Scholar] [CrossRef]
Lu, A.; Gao, X.; Xiao, C. Free Deterministic Equivalents for the Analysis of MIMO Multiple Access Channel. IEEE Trans. Inf. Theory 2016, 62, 4604–4629. [Google Scholar] [CrossRef]
Wu, C.; Yi, X.; Zhu, Y.; Wang, W.; You, L.; Gao, X. Channel Prediction in High-Mobility Massive MIMO: From Spatio-Temporal Autoregression to Deep Learning. IEEE J. Sel. Areas Commun. 2021, 39, 1915–1930. [Google Scholar] [CrossRef]
Chen, M.; Viberg, M. Long-Range Channel Prediction Based on Nonstationary Parametric Modeling. IEEE Trans. Signal Process. 2009, 57, 622–634. [Google Scholar] [CrossRef]
Liu, L.; Feng, H.; Yang, T.; Hu, B. MIMO-OFDM Wireless Channel Prediction by Exploiting Spatial-Temporal Correlation. IEEE Trans. Wirel. Commun. 2014, 13, 310–319. [Google Scholar] [CrossRef]
Lv, C.; Lin, J.-C.; Yang, Z. Channel Prediction for Millimeter Wave MIMO-OFDM Communications in Rapidly Time-Varying Frequency-Selective Fading Channels. IEEE Access 2019, 7, 15183–15195. [Google Scholar] [CrossRef]
Yuan, J.; Ngo, H.Q.; Matthaiou, M. Machine Learning-Based Channel Prediction in Massive MIMO with Channel Aging. IEEE Trans. Wirel. Commun. 2020, 19, 2960–2973. [Google Scholar] [CrossRef]
Zhao, J.; Tian, H.; Li, D. Channel Prediction Based on BP Neural Network for Backscatter Communication Networks. Sensors 2020, 20, 300. [Google Scholar] [CrossRef]
Ahrens, J.; Ahrens, L.; Schotten, H.D. A Machine Learning Method for Prediction of Multipath Channels. ZTE Commun. 2019, 17, 12–18. [Google Scholar]
Sanchez-Fernandez, M.; de-Prado-Cumplido, M.; Arenas-Garcia, J.; Perez-Cruz, F. SVM multiregression for nonlinear channel estimation in multiple-input multiple-output systems. IEEE Trans. Signal Process. 2004, 52, 2298–2307. [Google Scholar] [CrossRef]
Engel, Y.; Mannor, S.; Meir, R. The kernel recursive least-squares algorithm. IEEE Trans. Signal Process. 2004, 52, 2275–2285. [Google Scholar] [CrossRef]
Liu, W.; Park, I.; Wang, Y.; Principe, J.C. Extended Kernel Recursive Least Squares Algorithm. IEEE Trans. Signal Process. 2009, 57, 3801–3814. [Google Scholar]
Guo, J.; Chen, H.; Chen, S. Improved Kernel Recursive Least Squares Algorithm Based Online Prediction for Nonstationary Time Series. IEEE Signal Process. Lett. 2020, 27, 1365–1369. [Google Scholar] [CrossRef]
Platt, J. A Resource-Allocating Network for Function Interpolation. Neural Comput. 1991, 3, 213–225. [Google Scholar] [CrossRef]
Liu, W.; Park, I.; Principe, J.C. An Information Theoretic Approach of Designing Sparse Kernel Adaptive Filters. IEEE Trans. Neural Netw. 2009, 20, 1950–1961. [Google Scholar] [CrossRef] [PubMed]
Richard, C.; Bermudez, J.C.M.; Honeine, P. Online Prediction of Time Series Data with Kernels. IEEE Trans. Signal Process. 2009, 57, 1058–1067. [Google Scholar] [CrossRef]
Yang, M.; Ai, B.; He, R.; Huang, C.; Ma, Z.; Zhong, Z.; Wang, J.; Pei, L.; Li, Y.; Li, J. Machine-Learning-Based Fast Angle-of-Arrival Recognition for Vehicular Communications. IEEE Trans. Veh. Technol. 2021, 70, 1592–1605. [Google Scholar] [CrossRef]
Cherkassky, V.; Mulier, F.M. Statistical Learning Theory. In Learning from Data: Concepts, Theory, and Methods, 1st ed.; John Wiley & Sons: Hoboken, NJ, USA, 2007. [Google Scholar]
Vaerenbergh, S.V.; Lazaro-Gredilla, M.; Santamaria, I. Kernel Recursive Least-Squares Tracker for Time-Varying Regression. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1313–1326. [Google Scholar] [CrossRef]
Xiong, K.; Wang, S. The Online Random Fourier Features Conjugate Gradient Algorithm. IEEE Signal Process. Lett. 2019, 26, 740–744. [Google Scholar] [CrossRef]

Figure 1. In TDD mobile communication systems, the beamforming performance of high-speed mobile terminal worsens. The user sends a SRS, and the base station runs channel estimation algorithms and performs beamforming. However, due to the user’s movement, the terminal can only obtain sidelobe gain in the downlink slots.

Figure 2. In TDD mobile communication systems, the interval between slots S and D is short in two adjacent SRS periods, and the LoS and scattering components are correlated.

Figure 3. An illustration of the input and output pairs in a channel prediction module.

Figure 4. An illustration of the channel prediction algorithm steps.

Figure 5. NMSE comparison of the proposed SSW-KRLS algorithm and the ALD-KRLS algorithm.

β = 0.97

,

N_{t} = 64

,

N_{r} = 4

.

Figure 5. NMSE comparison of the proposed SSW-KRLS algorithm and the ALD-KRLS algorithm.

β = 0.97

,

N_{t} = 64

,

N_{r} = 4

.

Figure 6. The kernel dictionary size of the proposed SSW-KRLS algorithm and ALD-KRLS algorithm varies with the number of iterations.

β = 0.97

,

N_{t} = 64

,

N_{r} = 4

.

Figure 6. The kernel dictionary size of the proposed SSW-KRLS algorithm and ALD-KRLS algorithm varies with the number of iterations.

β = 0.97

,

N_{t} = 64

,

N_{r} = 4

.

Figure 7. Performance comparison among no prediction (last uplink measurement), the ALD-KRLS algorithm, and the proposed SSW-KRLS algorithm.

β = 0.97

,

v = 60

km/h.

Figure 7. Performance comparison among no prediction (last uplink measurement), the ALD-KRLS algorithm, and the proposed SSW-KRLS algorithm.

β = 0.97

,

v = 60

km/h.

Figure 8. Performance comparison among no prediction (last uplink measurement), the ALD-KRLS algorithm, and the proposed SSW-KRLS algorithm.

β = 0.97

,

v = 120

km/h.

Figure 8. Performance comparison among no prediction (last uplink measurement), the ALD-KRLS algorithm, and the proposed SSW-KRLS algorithm.

β = 0.97

,

v = 120

km/h.

Table 1. Some common sparseness criteria.

Criterion	Indicators	Handling Method
ALD	Determine whether the kernel function of the new sample can be linearly represented by the kernel function of the existing samples in the dictionary: $δ (n) = min_{a} {∥\sum_{i = 1}^{m} a (i) φ (x_{i}) - φ (x_{n})∥}^{2}$	If $δ (n) < τ$ , discard new samples. If $δ (n) \geq τ$ , add the new sample to the dictionary.
NC	Calculate the minimal distance between the new and existing samples in the kernel dictionary: $d i s = min_{a} {∥\sum_{i = 1}^{m} a (i) φ (x_{i}) - φ (x_{n})∥}^{2}$	If $d i s < τ$ , discard new samples. If $d i s \geq τ$ , add the new sample to the dictionary.
SC	According to information theory, based on prior joint Gaussian distribution, the amount of information brought by the new sample is: $S (n) = - ln p (x_{n}, d_{n} \|D_{n - 1})$ , where $p (x_{n}, d_{n} \|D_{n - 1})$ is posterior probability distribution of $(x_{n}, d_{n})$	If $τ_{1} < S (n) < τ_{2}$ , add the new sample to the dictionary. If $S (n) > τ_{2}$ , discard new samples.
CC	Calculate the maximal kernel function of the new and existing samples : $μ = max_{i} \|k (x_{i}, x_{n})\|$	If $μ > τ$ , discard new samples. If $μ < τ$ , add the new sample to the dictionary.

Table 2. Some common sparseness criteria.

Scenario	3D Urban Macro (3D UMa)
Carrier frequency	3 kHz
Subcarrier spacing	30 kHz
Bandwidth	20 MHz
Channel model	CDL-A
Delay spread	100 ns
DL precoder	RZF
order	5

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ai, X.; Zhao, J.; Zhang, H.; Sun, Y. Sparse Sliding-Window Kernel Recursive Least-Squares Channel Prediction for Fast Time-Varying MIMO Systems. Sensors 2022, 22, 6248. https://doi.org/10.3390/s22166248

AMA Style

Ai X, Zhao J, Zhang H, Sun Y. Sparse Sliding-Window Kernel Recursive Least-Squares Channel Prediction for Fast Time-Varying MIMO Systems. Sensors. 2022; 22(16):6248. https://doi.org/10.3390/s22166248

Chicago/Turabian Style

Ai, Xingxing, Jiayi Zhao, Hongtao Zhang, and Yong Sun. 2022. "Sparse Sliding-Window Kernel Recursive Least-Squares Channel Prediction for Fast Time-Varying MIMO Systems" Sensors 22, no. 16: 6248. https://doi.org/10.3390/s22166248

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sparse Sliding-Window Kernel Recursive Least-Squares Channel Prediction for Fast Time-Varying MIMO Systems

Abstract

1. Introduction

1.1. Related Work

1.2. Contributions

2. System Model

3. Traditional KRLS Algorithm

3.1. Traditional KRLS Algorithm

3.2. Extensions to KRLS Algorithm

4. Proposed SSW-KRLS Algorithm

4.1. How to Optimally Discard an Old Sample

4.2. Case I: The Size of $D (i)$ Is Changed

4.3. Case II: The Size of $D (i)$ Is Unchanged

5. Performance Evaluation

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Sparse Sliding-Window Kernel Recursive Least-Squares Channel Prediction for Fast Time-Varying MIMO Systems

Abstract

1. Introduction

1.1. Related Work

1.2. Contributions

2. System Model

3. Traditional KRLS Algorithm

3.1. Traditional KRLS Algorithm

3.2. Extensions to KRLS Algorithm

4. Proposed SSW-KRLS Algorithm

4.1. How to Optimally Discard an Old Sample

4.2. Case I: The Size of D ( i ) Is Changed

4.3. Case II: The Size of D ( i ) Is Unchanged

5. Performance Evaluation

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2. Case I: The Size of $D (i)$ Is Changed

4.3. Case II: The Size of $D (i)$ Is Unchanged