A Reinforcement Learning Based Data Caching in Wireless Networks

Sheraz, Muhammad; Shafique, Shahryar; Imran, Sohail; Asif, Muhammad; Ullah, Rizwan; Ibrar, Muhammad; Khan, Jahanzeb; Wuttisittikulkij, Lunchakorn

doi:10.3390/app12115692

Open AccessArticle

A Reinforcement Learning Based Data Caching in Wireless Networks

by

Muhammad Sheraz

¹,

Shahryar Shafique

^1,*,

Sohail Imran

¹,

Muhammad Asif

^2,*,

Rizwan Ullah

^3,*,

Muhammad Ibrar

⁴,

Jahanzeb Khan

¹ and

Lunchakorn Wuttisittikulkij

^3,*

¹

Department of Electrical Engineering, Iqra National University, Peshawar 25000, Pakistan

²

Department of Electrical Engineering, Main Campus, University of Science & Technology, Bannu 28100, Pakistan

³

Wireless Communication Ecosystem Research Unit, Department of Electrical Engineering, Chulalongkorn University, Bangkok 10330, Thailand

⁴

Department of Physics, Islamia College Peshawar, Peshawar 25000, Pakistan

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2022, 12(11), 5692; https://doi.org/10.3390/app12115692

Submission received: 22 April 2022 / Revised: 20 May 2022 / Accepted: 31 May 2022 / Published: 3 June 2022

(This article belongs to the Topic Next Generation Intelligent Communications and Networks)

Download

Browse Figures

Versions Notes

Abstract

:

Data caching has emerged as a promising technique to handle growing data traffic and backhaul congestion of wireless networks. However, there is a concern regarding how and where to place contents to optimize data access by the users. Data caching can be exploited close to users by deploying cache entities at Small Base Stations (SBSs). In this approach, SBSs cache contents through the core network during off-peak traffic hours. Then, SBSs provide cached contents to content-demanding users during peak traffic hours with low latency. In this paper, we exploit the potential of data caching at the SBS level to minimize data access delay. We propose an intelligence-based data caching mechanism inspired by an artificial intelligence approach known as Reinforcement Learning (RL). Our proposed RL-based data caching mechanism is adaptive to dynamic learning and tracks network states to capture users’ diverse and varying data demands. Our proposed approach optimizes data caching at the SBS level by observing users’ data demands and locations to efficiently utilize the limited cache resources of SBS. Extensive simulations are performed to evaluate the performance of proposed caching mechanism based on various factors such as caching capacity, data library size, etc. The obtained results demonstrate that our proposed caching mechanism achieves 4% performance gain in terms of delay vs. contents, 3.5% performance gain in terms of delay vs. users, 2.6% performance gain in terms of delay vs. cache capacity, 18% performance gain in terms of percentage traffic offloading vs. popularity skewness (

γ

), and 6% performance gain in terms of backhaul saving vs. cache capacity.

Keywords:

caching; network delay; small base station; 5G; dynamic data popularity; reinforcement learning; Q-learning

1. Introduction

There is an unprecedented increase in the growth of mobile data traffic contributed by the advent of inexpensive smart electronic gadgets that are overburdening the scarce spectrum resources below 6 GHz [1]. This tremendous burst of users’ data requests imposes network overhead to provide data to the users during peak-traffic hours. To handle high data demands, it is imperative to develop new communication techniques such as internet of things (IoT), data caching, and cloud computing [2].

A hierarchical heterogeneous network comprised of macro and small cells is envisioned for 4G by 3GPP to ameliorate network density [3]. Under the coverage of macro cells, multiple low-power and small coverage Small Base Stations (SBSs) are deployed and connected to the core network through a wired backhaul link [4]. Conventionally, users fetch their required contents from the core network via serving SBSs, which not only incurs congestion at the serving SBS but also overburdens the backhaul link. It is an interesting fact that users’ data demands have a redundancy trend towards social networking, popular videos, and online gaming [5]. However, networks are equipped with limited operational capacity, therefore they are unable to handle explosive data demands, resulting in data traffic congestion [6]. Modern data traffic manifests reuse of asynchronous contents [7], because a large amount of data traffic incurs a backhaul bottleneck problem, which degrades the Quality of Experience (QoE). To relieve the traffic burden over the backhaul link, caching appears to be a promising approach that can directly serve contents to the users [8].

Caching at the network edge is recognized as an efficient way to mitigate bottleneck issues that arise due to SBSs’ densification [9]. The data placement at the SBS level has the potential to serve users with immediate effects, which can significantly reduce the network delay [10]. Therefore, the data placement at the SBS level has attracted tremendous attention from the research community. SBSs are located close to users, which can allow better data access by the users. Therefore, caching popular contents at the SBS level can fulfill users’ demands with minimum delay [11]. There are three benefits of SBS level caching. First, popular contents are placed close to the users, which reduces latency. Second, redundant data transmission over backhaul and network congestion is reduced. Third, most of the users’ data requests are fulfilled through cache resources rather than the core network, which alleviates the backhaul link traffic congestion. These aforementioned advantages motivate data service providers to adopt new advanced technologies to improve QoE. Due to the capacity constraint on the cache resources, it is imperative to utilize scarce cache resources in an intelligent manner [12]. For efficient data caching, the network must rely on accurate information in data requests for determining what and where to cache contents [13]. Furthermore, the performance of a caching scheme is affected by the fluctuation in data traffic load over time, which leads to waste of bandwidth [14]. To counter demand fluctuation issues, data placement during off-peak hours is a promising approach when it comes to meeting future users’ data demands for smoothing data traffic load. This appears to be an efficient way to optimize bandwidth utilization. Hence, it is imperative to shift the backhaul data traffic burden from peak to off-peak hours and perform data placement close to the users based on the predicted data popularity profile.

There are two categories of data caching schemes, i.e., static and dynamic caching schemes. In static caching schemes, cached contents remain the same [15]. Moreover, all the static caching schemes perform data caching based on the data popularity profile and the inter-file correlation is not exploited to improve the network performance [16]. However, dynamic caching schemes such as Least Frequently Used (LFU) and Least Recently Used (LRU) schemes update their cached contents with respect to time and new arriving contents to meet diverse users’ data demands [17]. Also, stochastic network information can be exploited to optimize data placement. For instance, the diversity of data caching can be optimized by modeling the cache-enabled SBSs through homogeneous Poisson Point Process (PPP) [18]. In addition, users’ mobility also poses significant challenges due to dynamic network connectivity and varying channel conditions; therefore, it is crucial to take users’ mobility patterns into account [19]. Machine learning based caching schemes can be a potential approach to optimize 5G cellular network performance by learning and tracking users’ mobility patterns and spatio-temporal data popularity profiles [20].

Motivated by the above mentioned prospects of data caching in a wireless network, in this paper we design an intelligent data caching approach to offload more data traffic through cache resources to minimize network delay. We implement data caching while considering users’ mobility and diverse data demands that introduce significant challenges. In addition, the RL approach is devised to optimize data caching without requiring any prior information. Moreover, we validated our proposed caching scheme performance under various network parameters such as varying cache capacity, data popularity profiling, etc. The prominent contributions of our work are summarized as follows:

We design a data caching approach in a 5G network comprised of a hierarchical heterogeneous network to minimize network delay. Moreover, we consider practical assumptions such as inter-cell interference and users’ mobility, which make our problem more arduous.
We model the data caching problem at the SBS level by considering evolving users’ data demands patterns. The data demands and cache states are taken as system states. Both the transmission of cached contents and replacement of old contents with new and more popular contents constitute the system action.
We propose a Q-learning based data caching scheme following a dynamic data popularity profile while taking cache capacity constraints and users’ mobility into account. It is notable that we formulate a tractable MDP in a mobility scenario, which is a highly challenging task.
Extensive simulations are conducted to demonstrate the performance of the propose caching scheme in minimizing network delay. The performance of our proposed Q-learning based caching scheme is compared with the four baseline caching schemes. These include learning based branch and bound (LB&B) [21], least frequently used content caching (LFUCC) [22], highly popular content caching (HPCC) [23], and random content caching (RCC) [24] schemes under different network parameters such as varying number of users, cache capacities, data popularity skewness, and data library sizes.

The rest of the paper is organized as follows. Section 2 discusses the related works on data caching. Section 3 provides detail of the system model. In Section 4, we present problem formulation. In Section 5, we discuss the Q-learning-based proposed data caching scheme. In Section 6, we provide simulation results to demonstrate the performance gains of our proposed caching scheme. Finally, Section 7 concludes the paper. Furthermore, the key notations are listed in Table 1.

2. Related Works

There is an unprecedented growth in data traffic on online social apps and platforms that strained the limited backhaul resources of wireless networks. Therefore, Nimrah et al. [25] proposed a proactive data caching mechanism to support interest-based user grouping for data sharing, while taking into account the complete information from the social graph for one-to-one interaction between users. The data caching entities are selected based on users’ profiles and revenue status. Then, the users’ group membership is approximated by utilizing users’ data demands logs without any prior information from the overlaid users’ social graph. The obtained results demonstrated a 30% to 34% performance gain in data access via cache resources. In [26], the authors proposed a concept of data caching at aerial base stations and defined delay tolerant and delay sensitive users which have different data transmission rate requirements. The authors proposed an iterative algorithm that applies a decomposition method. The obtained results showed a significant improvement in utilization of backhaul resources.

Nowadays, many researchers are exploiting the potential of machine learning to optimize data caching in wireless networks. Shuja et al. [27] provided a comprehensive survey about several machine learning approaches used in next generation edge networks. Doan et al. [28] proposed a caching mechanism to predict data popularity profiles and estimation of popularity of new contents based on features extraction of existing contents and similarity with newly arrived contents. The authors extracted spatio-temporal features of existing and newly arrived contents with a convolution neural network (CNN) with several stage pooling. Then, a clustering algorithm based on Jaccard and cosine distance is devised to extract contents features to handle the dimensional. Afterwards, contents are classified into various categories by employing support vector machine (SVM). The obtained results showed significant performance gain.

However, in a realistic network, users are mobile, and the network condition keeps evolving. Hence, a mechanism is necessary that does not require prior information to train the system and can learn through interaction with the environment. Therefore, researchers consider implementation of RL to optimize data caching in wireless networks. One such effort is made by Xiang et al. [29], where the RL approach is devised for content caching and wireless network slicing. The authors used a Zipf distribution to model users’ data demands, and the wireless network channel is designed as a finite state Markov channel. In [30], the authors proposed data caching at the base station level. RL is devised to optimize cache allocation without any previous information in an online fashion. The proposed approach learns through direct perturbation of the environment and observing variation in the network performance.

3. System Model

A cache-enabled wireless network is depicted in Figure 1, where cache-enabled SBSs are denoted by

S = \{1, 2, \dots, s, \dots, S\}

. The cache capacity of each SBS is equal to C. The users existing in the network are denoted by

U = \{1, 2, \dots, u, \dots, U\}

. The distribution of SBSs and users follow Poisson Point Process (PPP) with intensity

λ_{S}

and

λ_{U}

, respectively. The users connected with serving SBS s are represented by a binary element

l_{s, u}

, where

l_{s, u} = 1

if user u is connected to SBS s, otherwise

l_{s, u} = 0

. We consider that the network contains a data library of F contents represented by the set

F = \{1, 2, \dots, f, \dots, F\}

, where each content is of equal size

| f |

. Then, we used a Zipf distribution to capture the spatial heterogeneity of the data popularity profile [31] as follows:

ϑ_{f} = \frac{f^{- γ}}{\sum_{i = 1}^{F} i^{- γ}},

(1)

where

γ \geq 0

denotes the contents popularity distribution skewness that controls spatial heterogeneity of contents popularity. It is imperative to design an optimal caching strategy X to improve data access via cache resources. We define a decision variable

x_{s f}

to determine whether content f is cached by serving SBS s or not. If

x_{s f} = 1

, then content f is cached by SBS s, otherwise it is equal to 0. Thus, when a user u requests content f from serving SBS s, then SBS s decides transmission of content f based on the caching strategy X.

3.1. Mobility

In a wireless network, users exhibit mobility that causes complexity for data caching. Therefore, it is imperative to examine the influence of users’ mobility on data access through the cached resources. When users roam around, their connection to serving SBS is unstable, and requires an effective data caching mechanism to provide users’ their desired contents efficiently. Therefore, mobility patterns of users’ are observed by dividing the communication time between a user and serving SBS into T time slots, which is expressed by

T = \{1, 2, \dots t, \dots, T\}

. A user is considered static in the first time slot and moved to another position in the second time slot. In this manner, the communication time period between user u and serving SBS s is represented as follows:

t_{s, u} = \{(t - t_{o}) : ∥ρ_{s}^{t} - ρ_{u}^{t}∥ < R_{s u}, t > t_{o}\},

(2)

where

t_{o}

represents contact duration between serving SBS s and user u, and

R_{s u}

represents the user u under the coverage of SBS s.

ρ_{s}^{t}

and

ρ_{u}^{t}

denote the location of SBS s and user u, respectively.

t_{s, u}

follows an exponential distribution with parameter

λ_{s, u}

to determine a contact pattern between SBS s and user u.

3.2. Transmission Model

When a user u demands a content f from its serving SBS s, the desired content is transmitted with a data transmission rate [32] that is evaluated as

R_{μ} = W \times {log}_{2} (1 + Γ_{μ}) .

(3)

where W is the bandwidth to user u and

Γ_{μ}

represents the received signal to interference and noise ratio. Here

Γ_{μ}

is evaluated as follows:

Γ_{μ} = \frac{P_{s} d_{u, s}}{σ^{2} + I_{s}},

(4)

where

P_{s}

represents the transmission power of SBS s,

d_{u, s}

represents the distance between serving SBS s and user u.

σ^{2}

represents the path loss exponent, and

I_{s}

represents the inter-cell interference. However, by utilizing interference mitigation mechanisms such as spectrum reuse, power control, and alignment of interference, the inter-cell interference is considered as a constant noise.

4. Problem Formulation

Let

D_{u f} (X)

represent the average content f access delay for user u from its serving SBS s for caching strategy X. At least

T_{u f} (X)

time slots are required for successful transmission of content f. Thus, the minimum number of time slots for successful data transmission from the serving SBS s to the user u is determined as

T_{u f} = arg min \{T : R_{μ} (X, s) \geq \frac{| f |}{Δ t}\},

(5)

where

Δ t

represents time duration of a complete time slot. Thus, for user u, average delay of receiving content f is determined as

D_{u f} (X) = E \{T_{u f} (X)\} Δ t .

(6)

The delay experienced by user u in obtaining content f from serving SBS s is evaluated as

D_{u f} = \frac{| f |}{E [R_{μ}]},

(7)

When content f is not cached by serving SBS s, then first content f is fetched from the core network through a backhaul link before transferring content f towards the demanding user.

D_{u f} = D^{'} + \frac{| f |}{E [R_{μ}]},

(8)

where

D^{'}

represents extra delay of data transmission from core network to serving SBS s. According to [33],

D^{'}

is much larger than

\frac{| f |}{E [R_{μ}]}

and the impact of X on

D^{'}

is negligible. Therefore, the average delay

D^{'}

is fixed and can be determined by the average time of content f downloading from core network to the serving SBS s of user u.

In this work, our objective is network delay minimization while taking into account evolving users’ data preferences. Here, we formulate our objective function as

min_{\{x_{u f}\}} D (X) = \frac{1}{U} \sum_{u = 1}^{U} \sum_{f = 1}^{F} ϑ_{f} D_{u f} (X)

(9)

s . t . \sum_{f = 1}^{F} x_{u f} \leq C,

(10)

x_{u f} \in \{0, 1\}, \forall f \in F,

(11)

where constraint (10) sets a cache capacity constraint on the SBS. Our problem is NP-hard. Hence, it is challenging to find an optimal caching solution. Therefore, we utilize a machine learning approach known as reinforcement learning (RL) to optimize data caching in wireless networks while taking into account limited cache capacity and users’ diverse data demands.

5. Q-Leaning Based Data Caching

An illustration of RL-based data caching is provided in Figure 2. The efficiency of the data caching mechanism is measured by determining the delay experienced by the users accessing the data from the serving SBS. Therefore, we devise Q-learning based data caching at the SBS level. Q-learning constituted by state, action, and reward.

State: State ( $M$ ) is the set of cache states of serving SBS s, $M = \{m_{s} : \forall s \in S\}$ , where $m_{s}$ represents the contents placement of SBS s that is represented as $m_{s} = (m_{s} (1), m_{s} (2), \dots, m_{s} (f), \dots, m_{s} (F))$ .
Action: Action ( $A$ ) is the set of actions to optimize caching at SBS s. We define action set as $a_{s} = (a_{s} (1), a_{s} (2), \dots, a_{s} (f), \dots, a_{s} (F))$ to perform an adjustment of cached contents in the SBS s.
Reward: $ψ (m, a)$ is the reward provided to the SBS in return for performing an action a during state m, which is an expected reduction in data access delay from the cache resource.

The process of caching in SBS s at time t is described as

The current cache set of SBS s is sensed at time t, i.e., $m_{s t} \in M$ .
Based on the current cache state $m_{s t} \in M$ , an action $a_{s t} \in A$ is selected by SBS s.
The system transfers to a new state $m_{s (t + 1)} \in M$ in result of action $a_{s t} \in A$ . Moreover, in result of this transition a reward $ψ_{s t} = ψ (m_{s t}, a_{s t})$ is obtained.
The reward is provided to the SBS s and the whole process is repeated.

Our objective is to enhance rewards to reduce data access delay via cached resources. To increase rewards, it is necessary to perform optimal actions. A state-value function V is defined to determine data caching efficiency in the following manner:

V = \sum_{t = 0}^{\infty} α^{t} ψ^{t},

(12)

where

α \in (0, 1)

represents the discount factor to determine the effect of future rewards on the current action decisions.

We utilize Q-learning to determine the optimal caching policy

ζ (s)

for SBS s, which corresponds to V without prior system information. Since the cache decision is taken in an independent manner, optimal

V_{s}

for SBS s is determined as follows:

V_{s} = \sum_{t = 0}^{\infty} α^{t} ψ_{s t} .

(13)

Moreover,

V_{s}

is estimated by defining Q-value for every state-action pair. An optimal Q-value for SBS s is defined in the following manner:

\begin{matrix} Q_{s} (m_{s t}, a_{s t}) = & \{ψ (m_{s t}, a_{s t})\} + α \sum_{m_{s (t + 1)} \in M} P_{m_{s t}, m_{s (t + 1)}} (a_{m t}) \\ max_{a_{s (t + 1)} \in A} Q (m_{s (t + 1)}, a_{s (t + 1)}), \end{matrix}

(14)

where

m_{s}

represents the current state of SBS s,

m_{s (t + 1)}

represents the state transfer in result of performing an action

a_{s}

at state

m_{s}

, and

P_{m_{s}, m_{s (t + 1)}}

represents the probability of system transition from state

m_{s}

to

m_{s (t + 1)}

.

ψ (m_{s t}, a_{s t})

is evaluated by calculating the difference between gain of caching fresh content and loss of removing old content as follows:

υ_{s t}^{f^{+}} = \sum_{f = 1}^{F} r_{s t}^{f^{+}},

(15)

where

υ_{s t}^{f^{+}}

represents increase in data access via cache resources in result of caching new content. However, loss of replacing old content is evaluated as

υ_{s t}^{f^{-}} = \sum_{f = 1}^{F} r_{s t}^{f^{-}},

(16)

where

υ_{s t}^{f^{-}}

represents decrease in data access via cached resources as a result of removing previous content. Then, the reward function is determined as

ψ (m_{s t}, a_{s t}) = υ_{s t}^{f^{+}} - υ_{s t}^{f^{-}} .

(17)

Then, the relationship between value function

V_{s}

and Q-value is as follows:

V_{s} = max_{a_{s t} \in A} Q_{s} (m_{s t}, a_{s t}) .

(18)

If for every state-action pair an optimal Q-value is known, then an optimal policy is determined as

ζ (s) = \underset{a_{s t} \in A}{arg max} Q_{s} (m_{s t}, a_{s t}) .

(19)

Q (m, a)

is determined by the Q-learning algorithm in a recursive manner by utilizing the update rule for SBS [34].

\begin{matrix} Q_{s}^{t + 1} (m_{s t}, a_{s t}) = & (1 - ψ) Q_{s}^{t} (m_{s t}, a_{s t}) + \\ ψ (ψ_{s t} (m_{s t}, a_{s t}) + α {V_{s}}^{t} (m_{s t} + a_{s t})), \end{matrix}

(20)

where

α

represents learning rate,

m_{s} + a_{s}

is the state transition from state

m_{s}

as a result of an action

a_{s}

at time t, where

[V_{s}^{t} (m_{s} + a_{s}) = m a x_{a_{s}} Q_{s}^{t} (m_{s}, a_{s})]

.

The details of pseudo code of the proposed Q-learning based caching algorithm is provided in Algorithm 1. First, state-action pairs are initialized for the Q-learning caching mechanism. Then, actions

a_{s}

are selected by iterative evaluation of state-action pairs to maximize the Q-function. As a result of the selection of better actions, the system rewards are enhanced according to Equation (18). Thus, SBS s can cache contents that can potentially improve long term reward. In steps 9 and 10, the reward is achieved and system transition to a new state occurs. Finally, the Q-table is updated based on the rewards achieved. The proposed scheme evaluation has a complexity

O (| M |^{3})

. Moreover, Q-values are updated in each iteration; therefore, the proposed scheme update has a complexity

O (| A | | M |^{2})

.

Algorithm 1 Q-Learning based Data Caching at SBS level.

1:: Initialize Q-value $Q_{s} (m_{s t}, a_{s t})$ for every state-action pair.
2:: for $T = 1, 2, \dots ., t, \dots ., T$ do
3:: Choose a random probability p.
4:: if $p \geq ϵ$
5:: $a_{s t} = \underset{a_{s t} \in A}{arg max} Q_{s} (m_{s t}, a_{s t})$ ,
6:: otherwise,
7:: randomly select an action $a_{s t}$ .
8:: Execute action $a_{s t}$ in the system.
9:: Obtain reward $ψ_{s t}$ .
10:: $m_{s t}$ transits to the next state $m_{s (t + 1)}$ and enter the next interval.
11::    update,
    $Q_{s}^{t + 1} (m_{s t}, a_{s t}) = (1 - ψ) Q_{s}^{t} (m_{s t}, a_{s t}) +$
                            $ψ (ψ_{s t} (m_{s t}, a_{s t}) + α V_{s} (m_{s t} + a_{s t})),$
12:: end for

6. Performance Evaluation

A realistic mobility trajectory collected in Korea Advanced Institute of Science & Technology (KAIST), Korea is devised to model mobility patterns of users. The mobility patterns of users are observed through Garmin GPS 60CSx receivers. We exploit the

ϵ

greedy policy to maintain a balance between exploration and exploitation of action in order to maximize reward. The

ϵ

can be adjusted from (0, 1) [35]. The value of

ϵ = 0.5

is set to explore the actions to maximize the reward. Moreover,

ψ = 0.6

and

α = 0.7

is set. All the simulations are performed with Matlab R2017a. All simulation parameters are mentioned in Table 2.

The performance of our proposed RL-based caching mechanism is validated against state-of-the-art caching schemes:

Learning based Branch & Bound (LB&B): LB&B enables learning from patterns of users’ data demands based on their data viewing preferences.
Least Frequently Used Contents Caching (LFUCC): The frequency of users’ data requests is followed by a diverse data popularity profile. Therefore, LFUCC devised data demands frequency and content popularity to improve data access via cache resources.
Highly Popular Contents Caching (HPCC): HPCC enables caching highly demanded contents by the users to conserve limited cache resources.
Randomized Contents Caching (RCC): RCC exploits caching randomly to reduce the complexity without reducing data service quality.

In Figure 3, an impact of data library size is observed on data access delay via cache resources. The obtained results depict a low delay for small numbers of contents because there are data requests for only these contents. These small number of contents can be accommodated in the limited cache resources to minimize data access delay. However, when number of contents is large, then users have more choices of contents that cause diversity of data demands. This creates a challenge in optimizing data caching to improve data access via cache resources. In the case of failure of data access via cache resources, contents will be fetched from the core network with a large delay. Our proposed cache scheme achieves low delay of data access by exploiting Q-learning and users’ mobility patterns. When there are 300 contents, our proposed scheme achieves 4%, 8%, 9.8%, and 13.5% lower delay than LB&B, LFUCC, HPCC, and RCC, respectively. Hence, performance gain of the proposed caching scheme is high by employing intelligent cache decisions based on available cache resources and users’ data demands.

In Figure 4, the influence of the data popularity profile is observed on the cache decision of our proposed caching mechanism. Popularity skewness is utilized for the aforementioned reason. From the obtained results it is obvious that when popularity skewness value is high, then data access delay is low. This low delay is due to the fact that fewer contents are responsible for constituting data traffic that can be placed in the cached resources. This improves the potential of data access via cache resources. From the obtained results it is evident that our proposed caching mechanism outperforms all the baseline schemes. The delay is greatly reduced when popularity skewness is varied from 0.5 to 5. However, after popularity skewness equal to 2.5, the delay response is stable because contents generating data requests are already cached. Hence, popularity skewness has a significant role in driving users’ data viewing patterns and our proposed caching mechanism efficiently caches contents and improves content access via cache resources.

Figure 5 shows performance gain of our proposed caching mechanism under the influence of increasing number users. The efficiency of the proposed caching mechanism is clear in this result because increasing number of users implies more diverse data demands. However, all contents cannot be cached due to the cache constraint; therefore, optimal cache decisions are necessary. For instance, when there are 30 users, our proposed caching mechanism achieves 3.5%, 5.5%, 7.2%, and 12% lower delay than LB&B, LFUCC, HPCC, and RCC, respectively. This shows that our proposed caching mechanism can intelligently decide which contents should be cached based on evolving user data requests and positions.

In Figure 6, we analyzed the impact of the cache capacity on the data access delay. It is obvious, when cache capacity is large, then more contents can be cached. This enables more data access through the cache resources. When cache capacity is 60, our proposed caching mechanism achieves 2.6%, 3.3%, 7%, and 8.2% lower delay than LB&B, LFUCC, HPCC, and RCC, respectively. From the obtained results, it is clear that our scheme efficiently utilized cache resources, even in the case of low cache capacity, to accurately predict users’ data demands and perform caching accordingly.

In Figure 7, an impact of

γ

on traffic offloading via cache resource is analyzed. From obtained results, it is evident that more contents are accessed through cache resources with an increasing

γ

. It is due to the fact that with increasing value of

γ

, fewer contents are responsible for most of the data traffic, which are cached. Hence, more traffic is offloaded through cache resources. For

γ = 2

, our proposed mechanism performs 18%, 47.5%, 90%, and 96% more data offloading through cache resources than LB&B, LFUCC, HPCC, and RCC respectively. Hence, our proposed mechanism has optimized data placement and efficiently utilized limited cache resources.

In Figure 8, an improvement in backhaul saving in result of increasing cache capacity is analyzed. It is clear from the obtained results that when cache capacity is large more data requests are fulfilled via cache resource and backhaul resources are conserved. For instance, when cache capacity is 45, our proposed mechanism achieves 6%, 35%, 64%, and 74% high backhaul saving than LB&B, LFUCC, HPCC, and RCC, respectively. These obtained results demonstrate our proposed mechanism can accurately place contents in the cache resource by observing users’ data demands.

7. Conclusions

Our work addressed data caching problem in 5G cellular networks. We developed a data caching scheme for wireless networks while considering diverse data demands and mobility patterns of each user. Q-learning is utilized to optimize data placement at SBSs level in an online fashion, where no prior information of users’ data requests is available. The novel RL-based data caching can adaptively tune parameters for policy adjustment to spatio-temporal variations of data popularity profile and users’ data demands via light weight updates by considering users’ mobility patterns. We conducted extensive experiments to demonstrate that our proposed caching scheme can minimize network delay in result of increasing data access via cache resources substantially. For performance realization, we analyzed the network delay under various system parameters and the obtained results illustrated a significant reduction in network delay. Hence, it can be concluded that our proposed caching scheme can dispense with advantages under all the network parameters and transmission modes. In our future work, we optimize the QoE of individual users in vehicular networks. This requires customized down-sampling to accommodate the streaming bit rate requirement of users within a limited time period due to vehicle speed. It is a challenging task to provide optimal user scheduling and bit rate selection due to massive number of contents, diverse data demands, and different vehicle speeds. The Q-learning approach has the limitation to deal with high dimensional data and a large number of states and actions. We will exploit the potentials of Deep Q-learning to handle large numbers of states and actions of complex problems while providing accurate approximation or regression in estimating value-functions.

Author Contributions

Conceptualization and methodology, M.S., S.I., M.A. and L.W.; Supervision, S.S., M.A., R.U. and L.W.; software, M.S. and R.U.; writing, M.S., M.I. and M.A.; review and editing, M.S., L.W., J.K. and M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This Research project is funded by the Thailand Science research and Innovation Fund Chulalongkorn University (CU_FRB65_ind (12)_160_21_26) and the Second Century Fund (C2F), Chulalongkorn University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data included in the study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cisco Visual Networking Index. Global Mobile Data Traffic Forecast Update, 2016–2021 White Paper; Cisco: San Jose, CA, USA, 2017. [Google Scholar]
Yao, J.; Han, T.; Ansari, N. On mobile edge caching. IEEE Commun. Surv. Tutor. 2019, 21, 2525–2553. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, N.; Yang, P.; Shen, X. Cost-effective cache deployment in mobile heterogeneous networks. IEEE Trans. Veh. Technol. 2017, 66, 11264–11276. [Google Scholar] [CrossRef] [Green Version]
Sun, X.; Ansari, N. Latency aware workload offloading in the cloudlet network. IEEE Commun. Lett. 2017, 21, 1481–1484. [Google Scholar] [CrossRef]
Nguyen, T.M.; Ajib, W.; Assi, C. Designing wireless backhaul heterogeneous networks with small cell buffering. IEEE Trans. Commun. 2018, 66, 4596–4610. [Google Scholar] [CrossRef]
Li, L.; Zhao, G.; Blum, R.S. A survey of caching techniques in cellular networks: Research issues and challenges in content placement and delivery strategies. IEEE Commun. Surv. Tutor. 2018, 20, 1710–1732. [Google Scholar] [CrossRef]
Paschos, G.; Bastug, E.; Land, I.; Caire, G.; Debbah, M. Wireless caching: Technical misconceptions and business barriers. IEEE Commun. Mag. 2016, 54, 16–22. [Google Scholar] [CrossRef] [Green Version]
Poularakis, K.; Tassiulas, L. Code, cache and deliver on the move: A novel caching paradigm in hyper-dense small-cell networks. IEEE Trans. Mob. Comput. 2017, 16, 675–687. [Google Scholar] [CrossRef]
Nie, W.; Zheng, F.C.; Wang, X.; Zhang, W.; Jin, S. User-centric cross-tier base station clustering and cooperation in heterogeneous networks: Rate improvement and energy saving. IEEE J. Sel. Areas Commun. 2016, 34, 1192–1206. [Google Scholar] [CrossRef]
Song, X.; Geng, Y.; Meng, X.; Liu, J.; Lei, W.; Wen, Y. Cache-enabled device to device networks with contention-based multimedia delivery. IEEE Access 2017, 5, 3228–3239. [Google Scholar] [CrossRef]
Gong, J.; Zhou, S.; Zhou, Z.; Niu, Z. Policy optimization for content push via energy harvesting small cells in heterogeneous networks. IEEE Trans. Wirel. Commun. 2017, 16, 717–729. [Google Scholar] [CrossRef] [Green Version]
Liu, D.; Yang, C. Energy efficiency of downlink networks with caching at base stations. IEEE J. Sel. Areas Commun. 2016, 34, 907–922. [Google Scholar] [CrossRef]
Wang, R.; Peng, X.; Zhang, J.; Letaief, K.B. Mobility-aware caching for content-centric wireless networks: Modeling and methodology. IEEE Commun. Mag. 2016, 54, 77–83. [Google Scholar] [CrossRef] [Green Version]
Zhang, W.; Wu, D.; Yang, W.; Cai, Y. Caching on the move: A user interest-driven caching strategy for D2D content sharing. IEEE Trans. Veh. Technol. 2019, 68, 2958–2971. [Google Scholar] [CrossRef]
Poularakis, K.; Iosifidis, G.; Sourlas, V.; Tassiulas, L. Multicast-aware caching for small cell networks. In Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC), Istanbul, Turkey, 6–9 April 2014; pp. 2300–2305. [Google Scholar]
Golrezaei, N.; Mansourifard, P.; Molisch, A.F.; Dimakis, A.G. Base-station assisted device-to-device communications for high-throughput wireless video networks. IEEE Trans. Wirel. Commun. 2014, 13, 3665–3676. [Google Scholar] [CrossRef] [Green Version]
Wang, J. A survey of web caching schemes for the internet. ACM SIGCOMM Comput. Commun. Rev. 1999, 29, 36–46. [Google Scholar] [CrossRef]
Chae, S.H.; Choi, W. Caching placement in stochastic wireless caching helper networks: Channel selection diversity via caching. IEEE Trans. Wirel. Commun. 2016, 15, 6626–6637. [Google Scholar] [CrossRef] [Green Version]
He, J.; Song, W. Optimizing video request routing in mobile networks with built-in content caching. IEEE Trans. Mob. Comput. 2016, 15, 1714–1727. [Google Scholar] [CrossRef]
Zhou, H.; Wu, T.; Zhang, H.; Wu, J. Incentive-Driven Deep Reinforcement Learning for Content Caching and D2D Offloading. IEEE J. Sel. Areas Commun. 2021, 39, 2445–2460. [Google Scholar] [CrossRef]
Ning, Z.; Zhang, K.; Wang, X.; Guo, L.; Hu, X.; Huang, J.; Hu, B.; Kwok, R.Y. Intelligent Edge Computing in Internet of Vehicles: A Joint Computation Offloading and Caching Solution. IEEE Trans. Intell. Transp. Syst. 2021, 22, 2212–2225. [Google Scholar] [CrossRef]
Choi, H.; Park, S. Learning Future Reference Patterns for Efficient Cache Replacement Decisions. IEEE Access 2022, 10, 25922–25934. [Google Scholar] [CrossRef]
Chen, Z.; Lee, J.; Quek, T.Q.; Kountouris, M. Cooperative caching and transmission design in cluster-centric small cell networks. IEEE Trans. Wirel. Commun. 2017, 16, 3401–3415. [Google Scholar] [CrossRef] [Green Version]
Chen, M.; Hao, Y.; Hu, L.; Huang, K.; Lau, V.K. Green and Mobility-Aware Caching in 5G Networks. IEEE Trans. Wirel. Commun. 2017, 16, 8347–8361. [Google Scholar] [CrossRef]
Mustafa, N.; Khan, I.U.; Khan, M.A.; Uzmi, Z.A. Social Groups Based Content Caching in Wireless Networks. In Proceedings of the 19th ACM International Symposium on Mobility Management and Wireless Access, Alicante, Spain, 22–26 November 2021; pp. 133–136. [Google Scholar]
Kalantari, E.; Yanikomeroglu, H.; Yongacoglu, A. Wireless Networks With Cache-Enabled and Backhaul-Limited Aerial Base Stations. IEEE Trans. Wirel. Commun. 2020, 19, 7363–7376. [Google Scholar] [CrossRef]
Shuja, J.; Bilal, K.; Alasmary, W.; Sinky, H.; Alanazi, E. Applying machine learning techniques for caching in next-generation edge networks: A comprehensive survey. J. Netw. Comput. Appl. 2021, 181, 103005. [Google Scholar] [CrossRef]
Doan, K.N.; Van Nguyen, T.; Quek, T.Q.; Shin, H. Content-Aware Proactive Caching for Backhaul Offloading in Cellular Network. IEEE Trans. Wirel. Commun. 2018, 17, 3128–3140. [Google Scholar] [CrossRef]
Xiang, H.; Peng, M.; Sun, Y.; Yan, S. Mode Selection and Resource Allocation in Sliced Fog Radio Access Networks: A Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2020, 69, 4271–4284. [Google Scholar] [CrossRef] [Green Version]
Ben-Ameur, A.; Araldo, A.; Chahed, T. Cache allocation in multi-tenant edge computing via online reinforcement learning. In Proceedings of the IEEE International Conference on Communications (ICC), Seoul, Korea, 16–22 May 2022. [Google Scholar]
Rim, M.; Kang, C.G. Cache Partitioning and Caching Strategies for Device-to-Device Caching Systems. IEEE Access 2021, 9, 8192–8211. [Google Scholar] [CrossRef]
Fu, Q.; Yang, L.; Yu, B.; Wu, Y. Extensive Cooperative Content Caching and Delivery Scheme Based on Multicast for D2D-Enabled HetNets. IEEE Access 2021, 9, 40884–40902. [Google Scholar] [CrossRef]
Liu, J.; Bai, B.; Zhang, J.; Letaief, K.B. Cache Placement in Fog-RANs: From Centralized to Distributed Algorithms. IEEE Trans. Wirel. Commun. 2017, 16, 7039–7051. [Google Scholar] [CrossRef] [Green Version]
Sadeghi, A.; Sheikholeslami, F.; Giannakis, G.B. Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-Time Popularities. IEEE J. Sel. Top. Signal Process. 2018, 12, 180–190. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Introduction to Reinforcement Learning; The MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]

Figure 1. An illustration of data caching at SBSs.

Figure 2. An illustration of RL based data caching in wireless network.

Figure 3. Impact of number of contents on the network delay when N = 50 and

γ

= 1.

Figure 3. Impact of number of contents on the network delay when N = 50 and

γ

= 1.

Figure 4. Network delay vs. popularity skewness (

γ

) under various caching schemes when N = 50 and F = 400.

Figure 4. Network delay vs. popularity skewness (

γ

) under various caching schemes when N = 50 and F = 400.

Figure 5. Network delay with respect to increasing users’ density when N = 50,

γ

= 1, and F = 400.

Figure 5. Network delay with respect to increasing users’ density when N = 50,

γ

= 1, and F = 400.

Figure 6. Impact of increasing cache capacity on network delay when N = 50, F = 400, and

γ

= 1.

Figure 6. Impact of increasing cache capacity on network delay when N = 50, F = 400, and

γ

= 1.

Figure 7. Impact of increasing popularity skewness

γ

on percentage traffic offloading when N = 50 and F = 400.

Figure 7. Impact of increasing popularity skewness

γ

on percentage traffic offloading when N = 50 and F = 400.

Figure 8. Impact of increasing cache capacity on backhaul saving when N = 50, F = 400, and

γ

= 1.

Figure 8. Impact of increasing cache capacity on backhaul saving when N = 50, F = 400, and

γ

= 1.

Table 1. Notations.

Parameter	Description
s	Small base stations
u	Users
$l_{s, u}$	Binary element to determine user u connectivity with serving SBS s
F	Number of contents
$γ$	Popularity skewness
$x_{s, f}$	Decision variable to determine whether content f is cached by SBS s or not
$t_{s, u}$	Communication time between user u and SBS s
$R_{μ}$	Data transmission rate
W	Bandwidth
$P_{s}$	Transmission power of SBS s
$d_{u, s}$	Distance between user u and SBS s
$T_{u f}$	Time slots required for successful transmission of content f
$D_{u f}$	Delay experienced by user u in receiving content f from SBS s
$D^{'}$	Data transmission delay from core network to SBS s
V	State value function
$ψ_{s t}$	Transition reward of SBS s at time t
$α$	Discount factor

Table 2. System Parameters.

Parameter	Notation	Value
Square area of small cell	$A_{s c}$	250 m × 250 m
Bandwidth	W	20 MHz
Transmission power of SBS	$P_{s}$	30 dBm
Background noise	$σ_{μ}^{2}$	−174 dBm/Hz
Discount factor	$α$	0.7
Learning rate	$ψ$	0.8
Popularity skewness	$γ$	1

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sheraz, M.; Shafique, S.; Imran, S.; Asif, M.; Ullah, R.; Ibrar, M.; Khan, J.; Wuttisittikulkij, L. A Reinforcement Learning Based Data Caching in Wireless Networks. Appl. Sci. 2022, 12, 5692. https://doi.org/10.3390/app12115692

AMA Style

Sheraz M, Shafique S, Imran S, Asif M, Ullah R, Ibrar M, Khan J, Wuttisittikulkij L. A Reinforcement Learning Based Data Caching in Wireless Networks. Applied Sciences. 2022; 12(11):5692. https://doi.org/10.3390/app12115692

Chicago/Turabian Style

Sheraz, Muhammad, Shahryar Shafique, Sohail Imran, Muhammad Asif, Rizwan Ullah, Muhammad Ibrar, Jahanzeb Khan, and Lunchakorn Wuttisittikulkij. 2022. "A Reinforcement Learning Based Data Caching in Wireless Networks" Applied Sciences 12, no. 11: 5692. https://doi.org/10.3390/app12115692

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Reinforcement Learning Based Data Caching in Wireless Networks

Abstract

1. Introduction

2. Related Works

3. System Model

3.1. Mobility

3.2. Transmission Model

4. Problem Formulation

5. Q-Leaning Based Data Caching

6. Performance Evaluation

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI