Distribution of Multi MmWave UAV Mounted RIS Using Budget Constraint Multi-Player MAB

Mohamed, Ehab Mahmoud; Alnakhli, Mohammad; Hashima, Sherief; Abdel-Nasser, Mohamed

doi:10.3390/electronics12010012

Open AccessArticle

Distribution of Multi MmWave UAV Mounted RIS Using Budget Constraint Multi-Player MAB

by

Ehab Mahmoud Mohamed

^1,2,*

,

Mohammad Alnakhli

¹

,

Sherief Hashima

^3,4

and

Mohamed Abdel-Nasser

²

¹

Department of Electrical Engineering, College of Engineering in Wadi Addawasir, Prince Sattam Bin Abdulaziz University, Wadi Addawasir 11991, Saudi Arabia

²

Department of Electrical Engineering, Aswan University, Aswan 81542, Egypt

³

Computational Learning Theory Team, RIKEN-Advanced Intelligence Project (AIP), Fukuoka 819-0395, Japan

⁴

Engineering Department, Nuclear Research Center, Egyptian Atomic Energy Authority, Inshas, Cairo 13759, Egypt

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(1), 12; https://doi.org/10.3390/electronics12010012

Submission received: 7 November 2022 / Revised: 6 December 2022 / Accepted: 15 December 2022 / Published: 20 December 2022

(This article belongs to the Special Issue Online Learning Aided Solutions for 6G Wireless Networks)

Download

Browse Figures

Versions Notes

Abstract

:

Millimeter wave (mmWave), reconfigurable intelligent surface (RIS), and unmanned aerial vehicles (UAVs) are considered vital technologies of future six-generation (6G) communication networks. In this paper, various UAV mounted RIS are distributed to support mmWave coverage over several hotspots where numerous users exist in harsh blockage environment. UAVs should be spread among the hotspots to maximize their average achievable data rates while minimizing their hovering and flying energy consumptions. To efficiently address this non-polynomial time (NP) problem, it will be formulated as a centralized budget constraint multi-player multi-armed bandit (BCMP-MAB) game. In this formulation, UAVs will act as the players, the hotspots as the arms, and the achievable sum rates of the hotspots as the profit of the MAB game. This formulated MAB problem is different from the traditional one due to the added constraints of the limited budget of UAVs batteries as well as collision avoidance among UAVs, i.e., a hotspot should be covered by only one UAV at a time. Numerical analysis of different scenarios confirm the superior performance of the proposed BCMP-MAB algorithm over other benchmark schemes in terms of average sum rate and energy efficiency with comparable computational complexity and convergence rate.

Keywords:

UAV mounted RIS; MP-MAB; Millimeter wave; hotspot

1. Introduction

Millimeter wave (mmWave), i.e., 30 ∼ 300 GHz band, constitutes the corner millstone of the current fifth generation (5G) and the upcoming six generation (6G) networks [1]. This comes from its sizeable available spectrum. However, its high operating frequency causes mmWave signal to be weak and subject to bad channel conditions [2]. This makes it prone to path blockage and human shadowing. Nevertheless, Oxygen absorption highly degrades the quality of the mmWave link [3]. Therefore, antenna beamforming is recommended as an effective solution for overwhelming mmWave channel impairments. This can be conducted using beamforming training (BT) by means of steering antenna elements utilizing structured codebooks [4].

In dense hotspot scenarios containing numerous numbers of mmWave users, mmWave coverage should be extended and strengthened to fully cover hotspot users and overcome their mutual path blockage. In this regard, reconfigurable intelligent surface (RIS) [5] can provide an effective solution. It is a talented 6G approach that can smartly reconfigure the wireless communication channel [5]. This means that an RIS board can effectively control the mmWave channel by reinforcing the received signal in some directions and weakening it in other directions [6]. Thus, it can provide additional non-line of sight (NLoS) paths to mmWave users inside their hotspots. This can be conducted by passively controlling the incident Electromagnetic wave (EM) on the RIS board by adjusting the phase shifts (PSs) of its antenna array [6]. By this way, the complicated RF chains in the conventional relaying systems are highly relaxed [7]. Due to its cheapness and ease of installation, researchers investigated the application of RIS in numerous wireless communications systems [8,9,10,11,12]. These applications extend and strengthen the mmWave coverage, as presented in this paper [13,14]. However, in the case of numerous hotspots scenario, it will be challenging to install RIS boards nearby each hotspot, especially in the case of momentarily hotspots such as stadiums, theaters, markets, etc. In these scenarios, unmanned aerial vehicles (UAVs) will provide a practical and cost-effective solution, where the RIS boards will be attached to the UAVs. Then, these multi UAV mounted RIS will be distributed in the region of hotspots, and each UAV will serve a particular hotspot area. Recently, UAVs received significant attention in wireless communication due to their flying and maneuvering capabilities. For example, UAVs can be used as airborne base stations (BSs) to provide wireless connectivity in remote and post-disaster/stricken areas [15]. In addition, they can be used as relays to extend the coverage of mobile base stations (BSs) [16]. Moreover, data collection such as aerial photography, traffic, and environmental monitoring can be conducted quickly by UAVs [17,18].

Herein, the distribution of the UAVs mounted RIS among the hotspots becomes challenging due to their limited battery capacity. Therefore, each UAV should cover a hotspot, maximizing its achievable data rate while minimizing its flying and hovering energy consumptions. This problem is a non-polynomial (NP) time problem as its complexity increases in an NP behavior by increasing the number of UAVs and hotspots. In addition, the constraint of limited UAV battery capacity should be maintained while solving the problem. Furthermore, collision among UAVs should be avoided, i.e., no more than one UAV is permitted to cover a particular hotspot at a time. These constraints further complicate the optimization problem.

In this paper, online learning is used to efficiently address the problem of multi UAV mounted RIS distribution by considering it as a centralized budget constraint multi-player multi-armed bandit (BCMP-MAB) game. As a robust online learning tool, MAB can efficiently handle the fundamental exploitation-exploration learning trade off. In this context, a MAB player challenges the maximization of his profit via consistently exploiting the highest reward arm or exploring the less selected ones [19,20]. This should be conducted while the player only observes the achievable rewards of the played arms [19]. Thus, the main contributions of this paper can be summarized as follows:

UAVs mounted RIS are used to extend and strengthen the coverage of mmWave in highly dense hotspot areas containing considerable numbers of users. The distribution of the UAVs among the hotspots is formulated as an optimization problem to maximize the sum data rates of the hotspots while minimizing the flying and hovering energy consumptions of the UAVs.
The aforementioned optimization problem is reformulated as a centralized budget constraint MP-MAB game, where the players are the UAVs, the arms of the bandit are the hotspots, and the rewards are the achievable hotspots’ data rates. The proposed BCMP-MAB differs from the conventional MP-MAB game due to the added battery budget and UAVs collision-free constraints. The centralized nature of the proposed BCMP-MAB is used to avoid collisions among UAVs, and the budget constraint is used to take into account the limited battery capacity of UAVs when selecting the best hotspots at a time. To avoid such collisions, the UAV-hotspot selection process is made autonomously and sequentially by UAVs during the bandit game through centralized orchestration and information about the currently uncovered hotspots provided by the mmWave BS.
The proposed BCMP-MAB algorithm shows greater performance than other benchmark schemes via extensive numerical simulations under different scenarios.

The remainder of this paper is organized as follows: the literature review is given in Section 2. The proposed system model is presented in Section 3. The proposed BCMP-MAB algorithm is introduced in Section 4, and Section 5 gives the concluding remarks of this paper.

2. Related Works

Recently, few research works investigated the applications of RIS in mmWave communications. In [6,14], the coauthors of this paper proposed two-stage MAB schemes to find the optimal mmWave link from BS to RIS and from RIS to user equipment (UE), which maximizes the achievable data rate at the UE. Static and adaptively mixed relay RIS topologies were studied in [11], and they outperformed traditional benchmarking RIS architectures. Amplify and forward (AF) relay employing RIS-aided mmWave was investigated in [13], and a precise expression of signal-to-noise power ratio (SNR) was given. Moreover, the authors developed the AF relay’s optimal power allocation approach to acquire the ideal PSs for optimizing the end-to-end SNR. In [21], a dual methodology is proposed for active precoders at the mmWave BS and passive precoders at the RIS to maximize the achievable spectrum efficiency of the RIS-aided mmWave system. In [22], a mathematical framework was proposed to analyze the coverage of RIS enabled mmWave system. Moreover, Federated learning (FL) was used in [23] to optimize the performance of RIS-assisted mmWave. In [24,25], channel estimation of RIS-enabled mmWave was considered, whereas in [24], the cascaded nature of the mmWave RIS channel was utilized, while in [25], atomic norm minimization was adopted. In [26], hybrid precoding approach was proposed to adjust the analog/digital precoders of the mmWave BS as well as the PSs of the RIS board. A deep learning-empowered compressive sensing approach was proposed in [27] to adjust the precoders and the PSs of both mmWave BS and RIS. The authors of [28] investigated machine learning (ML) based beam management of RIS aided mmWave communication. In [29], passive precoding, power allocation as well as user association of RIS-aided mmWave are jointly optimized using sequential fractional programming (SFP) and forward-reverse auction (FRA) techniques. In [30], the best proper beams and reflection coefficients of RIS-assisted mmWave were investigated. Despite the existing literature on RIS aided mmWave communications, few investigated the RIS-enabled UAV for mmWave communications. In [31], the coauthors of this work studied the trajectory planning of UAV-mounted RIS over multiple hotspot areas via single-player MABs to maximize its achievable data rate while minimizing its energy consumption over its path from one hotspot to another. However, this work considered only one UAV setting, plus the problem of optimal UAVs distribution was not inspected. In [32], the authors showed the superior advantage of UAV-mounted RIS over that based on fixed RIS in enhancing the coverage of mmWave users. In addition, they used deep reinforcement learning (DRL) to model the environment and optimize the performance of mmWave UAV-mounted RIS system. In [33], the authors extended the work to jointly optimize the precoding matrix at the BS, the PSs at the RIS, and the location of the UAV-mounted RIS to maximize the total sum rate. However, in these two papers, only one UAV scenario was studied, and the optimal distribution of UAVs was not deemed. In [34], fixed RIS attached to a building is used to enhance the secrecy rate of the mmWave UAV communication. However, in this paper, no UAV-mounted RIS was proposed, and only fixed RIS was used to assist the UAV flying BS. In [35], fixed multiple RIS boards were used to aid UAV-enabled mmWave cellular communications. In this regard, the RIS deployment, user scheduling, beamforming vectors, and RIS phases were jointly optimized to maximize the system’s sum rate. In [36], fixed RIS board was used as an auxiliary to enhance the performance of UAV-enabled mmWave communications. In this regard, the power-delivering capability as well as the fading characteristics of RIS were studied. Again, no UAV-mounted RIS was implemented in [35,36].

Table 1 summarizes the research work conducted in RIS assisted mmWave UAV communications. Thus, to the best of our knowledge, no current research work considered the optimal distribution of multi UAV mounted RIS over hotspot areas such as the work presented in this paper.

3. System Model and Optimization Problem Formulation

This section will detail the proposed system model, the utilized channel models, and the optimization problem formulation of multi UAV distribution among hotspots.

3.1. Proposed System Model

Figure 1 shows the proposed system model of mmWave UAV-mounted RIS for hotspot area coverage. In this model, multiple RIS boards attached to UAVs are used to strengthen the coverage of mmWave BS at hotspots containing different numbers of UEs, such as stadiums, markets, etc. Every hotspot has a varied traffic demand based on the traffic needs of its associated users. The UAV-mounted RIS will provide an additional mmWave path from the mmWave BS to the mmWave users inside the hotspot, as shown by the dashed red lines in Figure 1, where the green lines indicates the direct path from the mmWave BS to the stadium hotspot area. Typically, the number of hotspots is higher than the number of UAVs. Thus, each hotspot should be served by only one UAV at a time. Based on the information of the uncovered hotspots provided by the mmWave BS, a free UAV should autonomously decide which hotspot from the uncovered ones it should fly towards and cover. Herein, we do not consider a fully centralized network, where the mmWave BS fully controls the UAV-hotspot selection, in order to prevent the high backhauling overhead, especially when using high number of UAVs. However, a fully centralized network will be the subject of our future investigations. The UAV-hotspot selection should maximize the achievable sum rate of the hotspots based on the specification of the attached RIS board while minimizing the flying and hovering energy consumptions of the UAV. After selecting a specific hotspot, the UAV informs the mmWave BS of its selection, and the mmWave BS controls the PSs of its attached RIS towards the chosen hotspot location, then considers this hotspot as covered. In this paper, we will focus on the optimal distribution of UAVs mounted RIS over the hotspots, while issues related to mmWave RIS channel estimation, joint BS active beamforming and RIS passive beamforming adjustment, and the effect of UAV turbulence on mmWave RIS channels are out of scope of this paper. These issues are already addressed in some of the research works as given in [24,26]. In the following two sub-sections, we will give the used mmWave channel models and the optimization problem formulation of the UAV-hotspot distributions, respectively.

3.2. MmWave Channel Models

The received (RX) power

P_{{r, n k}_{m}}

at UE k in hotspot, m consists of two components. One component directly comes from the LoS path from the mmWave BS (B), and the other comes from the NLoS provided by UAV n. This can be represented mathematically through (1) as follows:

P_{{r, n k}_{m}} = P_{r, B k_{m}} + P_{r, B n k_{m}}

(1)

where

P_{r, B k_{m}}

indicates the direct LoS power component received by UE

k_{m}

from BS, and

P_{r, B n k_{m}}

indicates the NLoS one traced through UAV n. For

P_{r, B k_{m}}

, the mmWave terrestrial link model given in [37] is utilized, where

P_{r, B k_{m}}

can be expressed as follows:

P_{r, B k_{m}} = P_{t} A_{t, B k_{m}} (θ_{t, B k_{m}}, θ_{- 3 d B}) A_{r, k_{m} B} (ϕ_{r, k_{m} B}, ϕ_{- 3 d B}) (\frac{η (P_{L o S} (d_{B k_{m}}))}{L_{L o S} (d_{B k_{m}})} + \frac{χ (P_{N L o S} (d_{B k_{m}}))}{L_{N L o S} (d_{B k_{m}})})

(2)

In (2),

P_{t}

is the transmit (TX) power of the mmWave BS.

η (P_{L o S} (d_{B k_{m}}))

and

χ (P_{N L o S} (d_{B k_{m}}))

are two Bernoulli random variables with probabilities

P_{L o S} (d_{B k_{m}})

and

P_{N L o S} (d_{B k_{m}}) = 1 - P_{L o S} (d_{B k_{m}})

indicating the LoS and NLoS probabilities as functions of the separation distance

d_{B k_{m}}

between mmWave BS and UE

k_{m}

as shown in Figure 2, respectively.

A_{t, B k_{m}} (θ_{t, B k_{m}}, θ_{- 3 d B})

and

A_{r, k_{m} B} (ϕ_{r, k_{m} B}, ϕ_{- 3 d B})

indicate the TX and RX beamforming gains of mmWave BS and UE

k_{m}

, respectively. Herein,

θ_{t, B k_{m}}

and

ϕ_{r, k_{m} B}

indicate the boresight angles of the TX and RX beams, while

θ_{- 3 d B}

and

ϕ_{- 3 d B}

are their

- 3 d B

beamwidths. By utilizing the 2D steerable antenna model with Gaussian’s main loop profile given in [37],

A_{t, B k_{m}} (θ_{t, B k_{m}}, θ_{- 3 d B})

can be expressed as follows:

A_{t, B k_{m}} (θ_{t, B k_{m}}, θ_{- 3 d B}) = A_{0} e x p (- 4 l n (2) {(\frac{θ - θ_{t, B k_{m}}}{θ_{- 3 d B}})}^{2}), A_{0} = {(\frac{1.6162}{sin (\frac{θ_{- 3 d B}}{2})})}^{2}

(3)

where

A_{0}

is the maximum antenna gain. For

A_{r, k_{m} B} (ϕ_{r, k_{m} B}, ϕ_{- 3 d B})

, the same equation given in (3) can be used except that

θ_{t, B k_{m}}

and

θ_{- 3 d B}

are replaced by

ϕ_{r, k_{m} B}

and

ϕ_{- 3 d B}

, respectively.

In (2),

L_{L o S} (d_{B k_{m}})

and

L_{N L o S} (d_{B k_{m}})

are the path losses of the LoS and NLoS paths as functions of the separation distance

d_{B k_{m}}

. They can be expressed as follows:

10 {log}_{10} (L_{v} (d_{B k_{m}})) = β_{v} + 10 α_{v} {log}_{10} (d_{B k_{m}}) + ε_{v},

(4)

where

v \in \{L O S, N L O S\}

,

β_{v} = 82.02 - 10 {α_{v} log}_{10} (d_{0})

is the path loss at a reference distance

d_{0}

.

α_{v}

is path loss exponent, and

ε_{v} ∽ N (0, δ_{v})

is the log-normal shadowing with zero mean and standard deviation of

δ_{v}

. Readers are advised to refer to [37] for the details behind these equations as well as their associated parameters.

The authors in [38] investigated the mmWave RX power received at UE from mmWave BS through far-field RIS board, like the case of UAV-mounted RIS deemed in this paper. They considered that all antenna elements of the RIS board will experience the same gain towards its center due to the far-field effect. Thus,

P_{r, B n k_{m}}

can be expressed as:

{P_{r, B n k_{m}} = P}_{t} ({(\frac{λ}{4 π})}^{4} {(Q_{n} Γ)}^{2} A_{t, B n} (θ_{t, B n}, θ_{- 3 d B}) G_{r, n B} (ϕ_{r, n B}) G_{t, n k_{m}} (θ_{t, n k_{m}}) A_{r, k_{m} n} (ϕ_{r, k_{m} n}, ϕ_{- 3 d B}) {(d_{B n} d_{n k_{m}})}^{- α})

(5)

where

P_{t}

and

λ

are the TX power and the wavelength of the mmWave BS signal.

Q_{n}

indicates the number of antenna elements of the RIS board attached to UAV n, and

Γ

is the amplitude reflection coefficient of the RIS elements.

α

is the path loss exponent, and

d_{B n}

and

d_{n k_{m}}

are the separation distances between BS and UAV n, and between UAV n and UE

k_{m}

, as shown in Figure 2, where the schematic diagram of the UAV-mounted RIS communications links is presented.

A_{t, B n} (θ_{t, B n}, θ_{- 3 d B})

and

A_{r, k_{m} n} (ϕ_{r, k_{m} n}, ϕ_{- 3 d B})

are the TX and RX beamforming gains from mmWave BS to UAV n, and from UE

k_{m}

to UAV n, respectively. Whereas

θ_{t, B n}

and

ϕ_{r, k_{m} n}

are the boresight angles of the beams, as shown in Figure 2, and

θ_{- 3 d B}

and

ϕ_{- 3 d B}

are their

- 3 d B

beamwidths. The values of

A_{t, B n} (θ_{t, B n}, θ_{- 3 d B})

and

A_{r, k_{m} n} (ϕ_{r, k_{m} n}, ϕ_{- 3 d B})

can be calculated using (3) employing their parameters. In (5),

G_{t, n k_{m}} (θ_{t, n k_{m}})

and

G_{r, n B} (ϕ_{r, n B})

are TX and RX beamforming gains from UAV n to UE

k_{m}

and at UAV n from mmWave BS, respectively. Whereas

θ_{t, n k_{m}}

and

ϕ_{r, n B}

are the boresight angles of the beams as shown in Figure 2.

G_{t, n k_{m}} (θ_{t, n k_{m}})

can be expressed as [38]:

G_{t, n k_{m}} (θ_{t, n k_{m}}) = 4 cos (θ_{t, n k_{m}}),

(6)

The same equation can be applied to calculate

G_{r, n B} (ϕ_{r, n B})

, but

θ_{t, n k_{m}}

should be replaced by

ϕ_{r, n B}

. The equation given in (6) matches field measurements well, as stated in [38]. For the detailed derivation of (5), including its associated parameters, readers are advised to check [38] along with its cited references. Thus, the spectral efficiency of UE

k_{m}

served by UV n can be expressed as:

ψ_{{n k}_{m}} = {log}_{2} (1 + P_{{r, n k}_{m}} / σ_{0}),

(7)

where

σ_{0}

indicates the AWGN noise power.

3.3. Optimization Problem Formulation of UAV-Hotspot Distribution

Assume that there is a set of

\emptyset_{M}

hotspots with a total number of M hotspots are distributed in mmWave BS area. In addition, there is a set of

\emptyset_{N}

UAVs with a total number of N UAVs, where

N \leq M

, are flying to cover some of these hotspots by providing additional mmWave links to their associated UEs. These UAVs should be distributed among the hotspots for maximizing their achievable data rates while minimizing UAVs’ flying and hovering energy consumptions. This should be conducted under the constraint that each uncovered hotspot should be covered by only one UAV at a time. Mathematically speaking, this optimization problem can be formulated as follows:

\begin{matrix} \begin{matrix} I_{M N}^{*} = \underset{\forall {I_{M N} \in I}_{M N}}{arg max} (W \sum_{n = 1}^{N} \sum_{m = 1}^{M} I_{m n} Ψ_{m n}), \end{matrix} \end{matrix}

(8a)

\begin{matrix} \begin{matrix} s . t . \end{matrix} \end{matrix}

(8b)

\begin{matrix} \begin{matrix} I_{M N} \in {\{0, 1\}}^{M \times N} \end{matrix} \end{matrix}

(8c)

\begin{matrix} \begin{matrix} \sum_{m = 1}^{M} I_{m n} = 1, \forall n \in \emptyset_{N} \end{matrix} \end{matrix}

(8d)

\begin{matrix} \begin{matrix} \sum_{n = 1}^{N} I_{m n} < 2, \forall m \in \emptyset_{M} \end{matrix} \end{matrix}

(8e)

\begin{matrix} \begin{matrix} E_{n m} \leq E_{b} \end{matrix} \end{matrix}

(8f)

where

I_{M N}^{*}

is the optimal UAV-hotspot assignment matrix, and

I_{M N} \in {\{0, 1\}}^{M \times N}

is the space of all available assigned matrices. W is the available bandwidth, and

Ψ_{m n}

is the total spectral efficiency in bps/Hz of hotspot m when covered by UAV n, which can be expressed as:

Ψ_{m n} = \sum_{k_{m} = 1}^{K_{m}} ψ_{n k_{m}},

(9)

where

K_{m}

is the total number of users contained in hotspot m. The constraints (8c) and (8d) are used to guarantee that each hotspot is covered by only one UAV if it is accessible. The fourth constraint means that the available energy of UAV n, i.e.,

E_{n m}

, needed to cover hotspot m is bounded by its battery capacity

E_{b}

, where

E_{n m}

equals:

{E_{n m} = P_{f} T_{f_{n}} + P}_{h} T_{h_{n}}, T_{f_{n}} = \frac{d_{n m}}{V_{f}}, T_{h_{n}} = \frac{R_{m}}{W Ψ_{m n}}

(10)

where

P_{f}

and

P_{h}

are the flying and hovering UAV powers while

T_{f_{n}}

and

T_{h_{n}}

are the flying and hovering periods.

T_{f_{n}}

is equal to the separation distance between the current position of UAV n and the location of its chosen hotspot m, i.e.,

d_{n m}

, divided by its flying speed

V_{f}

.

T_{h_{n}}

is equal to the traffic needs of hotspot m, i.e.,

R_{m}

in bits, divided by the available data rate in bps when covered by UAV n, i.e.,

W Ψ_{m n}

. Herein,

R_{m} = \sum_{k_{m} = 1}^{K_{m}} R_{k_{m}}

, where

R_{k_{m}}

is the traffic need of UE

k_{m}

in bits. The complexity of the optimization problem given in (8) is of order

O (\frac{M!}{(M - N)!})

, i.e., the number of permutations of N over M. Thus, this problem is an NP time problem, and the budget constraint in (8e) further complicates it.

4. Proposed BCMP-MAB Algorithm

In this section, to address the previous optimization problem, we will reformulate it as a time sequential optimization problem with the aim of maximizing the sum rates of the hotspots sequentially over time. Then, an online learning algorithm based on the MAB hypothesis will be envisioned to address the formulated time sequential problem efficiently.

4.1. MAB Concept

In the MAB game, a player plays over a bandit’s arms to maximize his achievable reward through his observations of the played arms [19]. The arms’ rewards may come from identical and independent distribution (i.i.d), and the MAB game will be classified as a stochastic MAB, or from random distributions and the MAB game will be classified as an adversarial MAB [19]. Exploitation and exploration are two main phases conducted by MAB players. In the exploitation phase, the best arm having the highest observed reward is selected, while in the exploration phase, the less selected arms are utilized [20]. In some cases, arms’ selection comes with paying cost, defined as budget constraint MAB. In this MAB game, the player tries to maximize his achievable profit while minimizing the paying cost of his selected arm [39]. In addition, the MAB games can be classified as single-player MAB (SP-MAB) or MP-MAB based on the number of players involved in the game. In the case of MP-MAB, collisions between players may happen, i.e., two or more players select the same arm simultaneously. Based on the collision model, the arm’s reward may be shared among the collided players or none of them gain a bonus. To prevent collisions among the players, some information should be shared among the players in a centralized manner. That is, if the current players’ selections are known beforehand, the new player will try to avoid their selections and play the game with the free arms only.

4.2. UAV-Hotspot Distribution Optimization Problem Reformulation

Based on the previously explained MAB hypothesis, the optimization problem given in (8) can be reformulated as a time-sequential BCMP-MAB game as follows:

\begin{matrix} \begin{matrix} I_{M N}^{*} = \underset{\forall {I_{M N, t} \in I}_{M N}}{arg max} (\frac{W}{T_{H}} \sum_{t = 1}^{T_{H}} \sum_{n = 1}^{N} \sum_{m = 1}^{M_{n, t}} I_{m n, t} Ψ_{m n, t}), \end{matrix} \end{matrix}

(11a)

\begin{matrix} \begin{matrix} s . t . \end{matrix} \end{matrix}

(11b)

\begin{matrix} \begin{matrix} T_{H} \in Z^{+} \end{matrix} \end{matrix}

(11c)

\begin{matrix} \begin{matrix} \emptyset_{M_{n}, t} \subset \emptyset_{M} \end{matrix} \end{matrix}

(11d)

\begin{matrix} \begin{matrix} I_{M N} \in {\{0, 1\}}^{M \times N} \end{matrix} \end{matrix}

(11e)

\begin{matrix} \begin{matrix} \sum_{m = 1}^{M} I_{m n, t} = 1, \forall n \in \emptyset_{N} \end{matrix} \end{matrix}

(11f)

\begin{matrix} \begin{matrix} \sum_{n = 1}^{N} I_{m n, t} < 2, \forall m \in \emptyset_{M} \end{matrix} \end{matrix}

(11g)

\begin{matrix} \begin{matrix} E_{n m} \leq E_{b} \end{matrix} \end{matrix}

(11h)

where

Ψ_{m n, t} = \sum_{k_{m} = 1}^{K_{m}} ψ_{n k_{m}, t}, ψ_{{n k}_{m}, t} = {log}_{2} (1 + P_{{r, n k}_{m}, t} / σ_{0}),

(12)

Herein,

1 \leq t \leq T_{H}

where

T_{H} \in Z^{+}

indicates the total time horizon, and

Z^{+}

is the set of positive integers. In (11),

I_{M N, t}

indicates the UAV-hotspot assignment matrix at time t, and

Ψ_{m n, t}

is the total spectral efficiency in bps of hotspot m when covered by UAV n at time t. Constraint (11c), i.e.,

\emptyset_{M_{n, t}} \subset \emptyset_{M}

means that the set of uncovered hotspots available for UAV n at time t, with a total number of

M_{n, t}

, is a subset of

\emptyset_{M}

. This information is sent to UAVs from mmWave BS through the dedicated control link. The 4th and 5th constraints (11e) and (11f) mean that only one UAV should cover one hotspot at time t. Thus, the sequential optimization problem given in (11) suggests a time-by-time selection of

I_{m n, t}

. At every time t, UAVs select their corresponding hotspots sequentially based on the uncovered hotspot information sent by BS. In other words, if UAV n selects hotspot m at time t, then this hotspot will be removed from the set of uncovered hotspots available for UAV

n + 1

selection, i.e.,

\emptyset_{M_{n + 1, t}} = \emptyset_{M_{n, t}} / {m}

.

4.3. Proposed BCMP-MAB Algorithm

Algorithm 1 gives the proposed BCMP-MAB algorithm, which is inspired by the cost-subsidy MAB algorithm given in [39]. This algorithm is a budget constraint version of the famous upper confidence bound (UCB) MAB algorithm [40], where the cost of the selected arm is considered while choosing the best bandit’s arm. In the original version of cost-subsidy MAB algorithm given in [39], after several rounds of pure exploration, the player calculates the UCB values, and the lower confidence bound (LCB) values of the candidate arms. Then, the arms with UCB values greater than or equal

(1 - ρ)

multiplied by the maximum LCB value are enumerated. Herein,

ρ \in \{0, 1\}

is a design parameter of the cost-subsidy algorithm. Among these enumerated candidate arms, the arm with the lowest cost is selected to played.

Algorithm 1: Proposed BCMP-MAB Algorithm

The inputs to the proposed BCMP-MAP algorithm are

\emptyset_{M}, \emptyset_{N}

and

ρ

, while the output is

I_{M N, t}^{*}

, i.e., the UAV-hotspot selection matrix at time t. For initialization, at

t = 0

and for

\forall \emptyset_{M}

and

\forall \emptyset_{N}

, the number of times hotspot m is selected by UAV n,

X_{m n, t}

, is set to 0. The average spectral efficiency of UAV n when covering hotspot m,

{\bar{Ψ}}_{m n, t}

, is set to 0, and the element

I_{m n, t}

is set to 0. The first phase of the BCMP-MAB algorithm is a pure exploration, where each UAV n should visit every hotspot m and obtains its achievable data rate and traffic need, which happens for

τ

rounds. That is for

1 \leq t \leq (M + N) τ

, a temporary number Temp is selected as given in Algorithm 1, where mod indicates the modulo operation, where

M + N

is used to assure the circulation of the N UAVs over the M hotspots. Then, for

1 \leq n \leq T e m p

, a hotspot

m_{n, t}^{*}

is selected for UAV n based on the equation given in Algorithm 1. Afterwards, UAV n flies towards it and obtains its achievable data rate

Ψ_{m_{n}^{*} n, t}

and traffic need

R_{m_{n}^{*} n, t}

. Then, the number of selections

X_{m_{n}^{*} n, t}

, and average spectral efficiency

{\bar{Ψ}}_{m_{n}^{*} n, t}

are updated as given in Algorithm 1. UAVs visit the hotspots in a circular shift manner by the means of the mod operations, where only one UAV covers one hotspot at a time.

After the pure exploration phase, hotspots selection is accomplished in the second phase of Algorithm 1. In this phase, for

(M + N) τ + 1 \leq t \leq T_{H}

, the set of uncovered hotspots available for the first UAV

\emptyset_{M_{1}, t}

is set to equal

\emptyset_{M}

. This first UAV can be selected at random by the mmWave BS. As we previously explained, the BS controls the sequential UAV-hotspot selection to avoid collisions among UAVs and satisfy constraints 3 and 4 in (8). Then, for

1 \leq n \leq N

, UCB and LCB values for UAV n for

\forall m_{n} \in \emptyset_{M_{n}, t}

are calculated as follows:

γ_{m_{n} n, t}^{U C B} = {\bar{Ψ}}_{m_{n} n, t} + \sqrt{2 ln (t) / X_{m_{n} n, t}} \forall m_{n} \in \emptyset_{M_{n}, t},

(13)

γ_{m_{n} n, t}^{L C B} = {\bar{Ψ}}_{m_{n} n, t} - \sqrt{2 ln (t) / X_{m_{n} n, t}} \forall m_{n} \in \emptyset_{M_{n}, t},

(14)

Then, the maximum of

γ_{m_{n} n, t}^{L C B}

is determined as follows:

γ_{m a x}^{L C B} = max_{m_{n}} γ_{m_{n} n, t}^{L C B}

(15)

A feasibility group of candidate hotspots for UAV n is constructed as follows:

F s_{n} (t) = {m_{n} : γ_{m_{n} n, t}^{U C B} \geq (1 - ρ) γ_{m a x}^{L C B}},

(16)

From

{F s}_{n} (t)

, the hotspot

m_{n, t}^{*}

characterized with the minimum flying and hovering energy consumptions is selected by UAV n at time t as given in Algorithm 1. This hotspot selection is conducted autonomously by UAV n based on its corresponding

\emptyset_{M_{n}, t}

sent by BS. In calculating the expected hovering time of hotspot

m_{n}

, as the UAV has no prior knowledge about the spectral efficiency and traffic needs of hotspots, it uses its previous observations

Ψ_{m_{n} n, t - 1}

and

R_{m_{n} n, t - 1}

at time

t - 1

, as given in Algorithm 1. After selecting

m_{n, t}^{*}

, UAV n will fly towards it and cover it. Then, its corresponding element

I_{m_{n}^{*} n, t}

in the UAV-hotspot selection matrix is set to 1, and its associated parameters

X_{m_{n}^{*} n, t}

and

{\bar{Ψ}}_{m_{n}^{*} n, t}

are updated as given in Algorithm 1. Moreover, it will be removed from the set of available hotspots

\emptyset_{M_{n + 1}, t}

for the next UAV

n + 1

as given in Algorithm 1, which can be selected at random by the BS. The set of

\emptyset_{M_{n + 1}, t}

will be collected and sent to the UAV

n + 1

by BS to schedule the UAV-hotspot selection process to be conducted one by one to prevent UAVs collision, as previously explained.

5. Numerical Analysis

In this section, Monto Carlo (MC) numerical simulations are conducted to prove the effectiveness of the proposed BCMP-MAB algorithm over other benchmarks in different scenarios. In the undertaken simulations, a simulation area of 25 km² is established, where 100 hotspots are uniformly distributed inside it. Different number of UAV mounted RIS are used to cover these hotspots based on the conducted simulation scenario. Each attached RIS board has a random number of antenna elements. The altitude of the UAVs is set to 6 m. In addition, each UAV has two statuses, the flying status when it flies towards a hotspot with a flying speed of

V_{f}

= 5 Km/h, and hovering status when it covers a hotspot. In addition, each hotspot contains random number of UEs with spontaneous traffic needs, as given in Table 2, which summarizes the simulation parameters used in the conducted numerical simulations unless otherwise stated. For comparisons, random (Rand) selection, where UAV n arbitrarily selects its associated hotspot is provided. In addition, the performance of the nearest hotspot selection is given, where UAVs always choose their nearest hotspots. Moreover, the performance of naïve UCB is shown, where only

γ_{m_{n} n, t}^{U C B}

is calculated as given in Algorithm 1, and the hotspot corresponding to the maximum

γ_{m_{n} n, t}^{U C B}

value is selected. In addition, the maximum rate-based (max rate) selection is given, where UAV always selects the hotspot maximizing its achievable data rate in bps irrespective of its energy consumption. In all compared schemes, i.e., Rand, nearest, naïve UCB and max rate, no UAVs collision avoidance as well as UAV energy minimization are considered. This means that two or more UAVs can cover the same hotspot, where the achievable hotspot data rate is shared among them.

5.1. Adjusting the Value of $ρ$

In this part of numerical analysis, we will adjust the value of

ρ

, where N is set to 20 UAVs. Figure 3 and Figure 4 give the average sum rate in Gbps and average energy consumption in Joule of the UAVs using BCMP-MAB algorithm against the value of

ρ

. When

ρ

is equal to 0, only candidate hotspots with high UCB values, i.e., high average spectral efficiencies, are picked in

{F s}_{n} (t)

group of UAV n as given in Algorithm 1. This results in high average sum rates but at the expense of high UAV energy consumption. On the other hand, when

ρ

is equal to 1, all available hotspots are included in

{F s}_{n} (t)

. This results in low UAV energy consumption as the lowest energy consumption hotspot will always be selected by UAV n, but the expense of low average sum rate. From both figures,

ρ = 0.6

is chosen as a sufficient value of

ρ

as given in Table 1. This is because, at

ρ = 0.6

, the average sum rate is slightly reduced by

93.8 %

from its maximum value, while the average energy consumption is highly reduced to

68 %

from its maximum value.

5.2. Performance against Number of UAVs

Figure 5 shows the average sum rate against the number of used UAV-mounted RIS. It clearly appears that the proposed BCMP-MAB has the best performance among the schemes involved in the comparison. This comes from compromising between maximizing the achievable data rate while minimizing UAVs energy consumption. Nevertheless, the proposed UAV scheduling mechanism orchestrated by mmWave BS eliminates collisions, i.e., reward sharing among UAVs, hence maximizing the average sum rate compared to other benchmarks. It is interesting to notice that the average sum rate performance of naïve UCB matches that of max rate. This is because UAV always selects the hotspot with the highest average data rate for both schemes. Moreover, both schemes show lower average sum rate performances than the proposed BCMP-MAB because there is no collision avoidance mechanism in these schemes. Thus, multiple UAVs can cover the same hotspot and share its achievable data rate among them, while many other hotspots are left uncovered. Rand and nearest show the worst performance, and Rand is slightly better than the nearest due to randomness in selecting the associated hotspot. At

N = 10

, the proposed BCMP-MAB shows a higher average sum rate than naïve UCB/max rate, Rand and nearest by 1.36, 9.52 and 10.19 times, respectively. However, at

N = 100

, these values become 1.35, 2.12, and 2.54 times, respectively.

Figure 6 shows the energy efficiency performances in Gbps/J of the schemes involved in the comparisons. As shown by this figure, the proposed BCMP-MAB has the best energy efficiency due to compromising between maximizing the achievable data and minimizing the energy efficiency, and maintaining collision free among UAVs. In addition, Nearest shows better energy efficiency than naïve UCB, max rate, and Rand. This is because it highly reduces the flying energy consumption of UAVs due to selecting the nearest hotspot to them. Again, naïve UCB and max rate show almost the same energy efficiency performance due to their identical objective. At

N = 10

, about 47 and 265, and 59.3 times higher energy efficiency than naïve UCB/max rate, Rand and nearest are obtained by the proposed BCMP-MAB algorithm, respectively. These values become 54.6, 71.5, and 18.76 times at

N = 100

, respectively.

5.3. Performance against TX Power

In this part of numerical analysis, we will bound the performance of the schemes involved in the comparisons against the TX power

P_{t}

dBm. Figure 7 shows the average sum rate performances of the schemes involved in the comparisons against

P_{t}

dBm using

N = 20

. From this figure, the proposed BCMP-MAB has the best performance, especially for high

P_{t}

values due to its spectral efficiency maximization combined with UAVs collision avoidance. Still, naïve UCB and max rate schemes show almost the same performance, and rand-based selection outperforms the nearest based on their policies. At

P_{t} = 10

dBm, the proposed BCMP-MAB algorithm obtains average sum rate higher than naïve UCB/max rate, Rand, and nearest by 1.44, 6.35, and 9.4 times, respectively. However, at

P_{t} = 60

dBm, about 2.52, 3.13, and 3.14 times higher average sum rate than naïve UCB/max rate, Rand and nearest, are obtained, respectively.

Figure 8 gives the average energy efficiency of the schemes involved in the comparison against

P_{t}

. As explicitly shown, the proposed BCMP-MAB has the best performance for all tested

P_{t}

values. In addition, naïve UCB and max rate almost have the same performance due to the aforementioned reasons. Rand has the worst energy efficiency performance especially for high values of

P_{t}

. It is interesting to notice that the nearest hotspot-based selection has lower energy efficiency performance than naïve UCB, max rate, and even Rand at low

P_{t}

values, while it shows better performance than those at high

P_{t}

values. This is because at very low

P_{t}

values, the hovering time will be considerable. This makes the hovering energy consumption larger than the flying energy consumption and has the most dominant effect. Thus, all compared schemes, without energy minimization features, will influence almost the same energy consumption values. Hence, the average sum rate of these schemes will have the dominant effect in differentiating among their energy efficiency performances. However, at high values of

P_{t}

the opposite happens, i.e., the flying energy consumption will be higher than the hovering energy consumption. Consequently, the nearest scheme will have lower energy consumption than naïve UCB, max rate, and Rand, which results in improving its energy efficiency over them, as shown in Figure 8. At

P_{t} = 10

dBm, the energy efficiency of the proposed BCMP-MAB is 52.36, 221.76, and 342.72 times higher than naïve UCB/max rate, Rand, and nearest, respectively. At

P_{t} = 60

dBm, these values become 6419.5, 4291.8, and 425.2 times, respectively.

5.4. Convergence Analysis

Figure 9 and Figure 10 show the convergence performances of the schemes involved in the comparisons against the time horizon. In the conducted simulations,

P_{t}

is set to 1 Watt,

M = 100

,

N = 20

in Figure 9 while

N = 100

in Figure 10. From both figures, the proposed BCMP-MAB shows fast convergence comparable to the convergence rate of the naïve UCB. For both cases, at

t = 30

, the average sum rate of the proposed BCMP-MAB and naïve UCB reached about

96 %

of their maximum values at

t = 1000

. In addition, the average sum rate of the naïve UCB converges to that belongs to the max rate scheme.

5.5. Computational Complexity Comparisons

In the naïve UCB algorithm, the computational complexity comes from calculating the UCB values of hotspots for each UAV and updating their corresponding parameters with computational complexity of

O (N (M + 1))

[31,40]. The computational complexity of the proposed BCMP-MAB consists of two parts. The first part comes from the pure exploration phase, where each UAV should visit all available hotspots several times and update their corresponding parameters with computational complexity of

O (N)

. The second part comes from the hotspot selection phase, which is like naïve UCB except that both UCB and LCB values are calculated. Then, the parameters of the selected hotspots are updated. For simplicity of computational complexity calculation, let us neglect the elimination of the previously selected hotspots. Thus, the upper bound of the computational complexity of the second BCMP-MAB phase can be approximated as

O (N (2 M + 1))

[39]. Consequently, the upper bound of the total computational complexity of the proposed BCMP-MAB can be written as

O (N) + O (N (2 M + 1))

. For the nearest and the maximum rate hotspot-based selections, the distances between UAVs and hotspots and the expected rates between them are calculated, respectively. Then, the selection decision is taken individually for each UAV with total computational complexity of

O (N M)

for both schemes. For random based selection, each UAV selects a random hotspot out of M total hotspots with total computational complexity of

O (N)

. Thus, the computational complexity of the proposed BCMP-MAB is approximately double the naïve UCB, max rate, and nearest, while it is almost

2 M

times the random selection. Yet, the performance improvements in energy efficiency using the proposed BCMP-MAB are larger than the degradations in computational complexity, as given in the above simulation results.

6. Conclusions

In this paper, we proposed multi-UAV mounted RIS to cover dense hotspots using mmWave links. The problem of distributing UAVs among the hotspots was formulated as an optimization problem with the aim of maximizing the achievable hotspots sum rate while minimizing both UAVs’ flying and hovering energy consumptions. To efficiently address this problem within its constraints, it is reformulated as a time sequential budget constraint MAB problem. Then, a BCMP-MAB algorithm was proposed to address it, where UAVs functioned as the players, hotspots as the bandit’s arms and achievable rate as the rewards. Moreover, collision avoidance among UAVs and UAVs budget constraint were considered while maximizing the achievable sum rate. The proposed algorithm showed superior average sum rate and energy efficiency compared to naïve UCB, max rate, random, and nearest-based hotspot selection. For example, at N = 10 and

P_{t}

= 10 dbm, the proposed BCMP-MAB shows a higher energy efficiency than naïve UCB/max rate, Rand and nearest by 47 and 265, and 59.3 times, respectively. In addition, at N = 20 and

P_{t}

= 60 dbm, these values become 6419.5, 4291.8, and 425.2, respectively. These significant enhancements come with only double the computational complexity of the naïve UCB, max rate, and nearest, while it is almost

2 M

times the computational complexity of the random selection. Although we proposed the BCMP-MAB algorithm to address the multi-UAV mounted RIS distribution problem, other solutions such as Q-learning and DRL are applicable as well. However, more investigations are needed to study their realization as well as bounding their performances against the proposed BCMP-MAB scheme. Moreover, the turbulence effect of UAVs due to the rotation of the propellers in conjunction with RIS communication will be one of our future research directions.

Author Contributions

Conceptualization, E.M.M. and S.H.; methodology, E.M.M.; software, E.M.M.; validation, E.M.M., S.H. and M.A.-N.; formal analysis, E.M.M.; investigation, E.M.M.; resources, E.M.M. and M.A.; data curation, E.M.M.; writing—original draft preparation, E.M.M. and S.H.; writing—review and editing, M.A.-N.; visualization, M.A.-N.; supervision, E.M.M.; project administration, E.M.M. and M.A.; funding acquisition, E.M.M. and M.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Eduaction in Saudi Arabia for funding this research work through the project number (IF2/PSAU/2022/01/21627).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sakaguchi, K.; Mohamed, E.M.; Kusano, H.; Mizukami, M.; Miyamoto, S.; Rezagah, R.E.; Takinami, K.; Takahashi, K.; Shirakata, N.; Peng, H.; et al. Millimeter-wave Wireless LAN and its Extension toward 5G Heterogeneous Networks. IEICE Trans. Commun. 2015, 98-B, 1932–1948. [Google Scholar] [CrossRef] [Green Version]
Mohamed, E.M.; Sakaguchi, K.; Sampei, S. Wi-Fi Coordinated WiGig Concurrent Transmissions in Random Access Scenarios. IEEE Trans. Veh. Technol. 2017, 66, 10357–10371. [Google Scholar] [CrossRef]
Rappaport, T.S.; Sun, S.; Mayzus, R.; Zhao, H.; Azar, Y.; Wang, K.; Wong, G.N.; Schulz, J.K.; Samimi, M.; Gutierrez, F. Millimeter Wave Mobile Communications for 5G Cellular: It Will Work! IEEE Access 2013, 1, 335–349. [Google Scholar] [CrossRef]
Abdelreheem, A.; Mohamed, E.M.; Esmaiel, H. Adaptive location-based millimetre wave beamforming using compressive sensing based channel estimation. IET Commun. 2019, 13, 1287–1296. [Google Scholar] [CrossRef]
ElMossallamy, M.A.; Zhang, H.; Song, L.; Seddik, K.G.; Han, Z.; Li, G.Y. Reconfigurable Intelligent Surfaces for Wireless Communications: Principles, Challenges, and Opportunities. IEEE Trans. Cogn. Commun. Netw. 2020, 6, 990–1002. [Google Scholar] [CrossRef]
Mohamed, E.M.; Hashima, S.; Hatano, K.; Aldossari, S.A. Two-Stage Multiarmed Bandit for Reconfigurable Intelligent Surface Aided Millimeter Wave Communications. Sensors 2022, 22, 2179. [Google Scholar] [CrossRef]
Björnson, E.; Özdogan, Ö.; Larsson, E.G. Intelligent Reflecting Surface Versus Decode-and-Forward: How Large Surfaces are Needed to Beat Relaying? IEEE Wirel. Commun. Lett. 2020, 9, 244–248. [Google Scholar] [CrossRef] [Green Version]
Cui, Z.; Guan, K.; Zhang, J.; Zhong, Z. SNR Coverage Probability Analysis of RIS-Aided Communication Systems. IEEE Trans. Veh. Technol. 2021, 70, 3914–3919. [Google Scholar] [CrossRef]
Pei, X.; Yin, H.; Tan, L.; Cao, L.; Li, Z.; Wang, K.; Zhang, K.; Björnson, E. RIS-Aided Wireless Communications: Prototyping, Adaptive Beamforming, and Indoor/Outdoor Field Trials. IEEE Trans. Commun. 2021, 69, 8627–8640. [Google Scholar] [CrossRef]
Tang, W.; Li, X.; Dai, J.Y.; Jin, S.; Zeng, Y.; Cheng, Q.; Cui, T.J. Wireless communications with programmable metasurface: Transceiver design and experimental results. China Commun. 2019, 16, 46–61. [Google Scholar] [CrossRef]
Nguyen, N.T.; Vu, Q.D.; Lee, K.; Juntti, M. Hybrid Relay-Reflecting Intelligent Surface-Assisted Wireless Communications. IEEE Trans. Veh. Technol. 2022, 71, 6228–6244. [Google Scholar] [CrossRef]
Zhao, D.; Lu, H.; Wang, Y.; Sun, H. Joint Passive Beamforming and User Association Optimization for IRS-assisted mmWave Systems. In Proceedings of the GLOBECOM 2020—2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
Du, H.; Zhang, J.; Cheng, J.; Ai, B. Millimeter Wave Communications With Reconfigurable Intelligent Surfaces: Performance Analysis and Optimization. IEEE Trans. Commun. 2021, 69, 2752–2768. [Google Scholar] [CrossRef]
Mohamed, E.M.; Hashima, S.; Anjum, N.; Hatano, K.; Shafai, W.E.; Elhlawany, B.M. Reconfigurable intelligent surface-aided millimetre wave communications utilizing two-phase minimax optimal stochastic strategy bandit. IET Commun. 2022, 16, 2200–2207. [Google Scholar] [CrossRef]
Mohamed, E.M.; Hashima, S.; Aldosary, A.; Hatano, K.; Abdelghany, M.A. Gateway Selection in Millimeter Wave UAV Wireless Networks Using Multi-Player Multi-Armed Bandit. Sensors 2020, 20, 3947. [Google Scholar] [CrossRef] [PubMed]
Zhan, P.; Yu, K.; Swindlehurst, A.L. Wireless Relay Communications with Unmanned Aerial Vehicles: Performance and Optimization. IEEE Trans. Aerosp. Electron. Syst. 2011, 47, 2068–2085. [Google Scholar] [CrossRef]
Mkiramweni, M.E.; Yang, C.; Li, J.; Zhang, W. A Survey of Game Theory in Unmanned Aerial Vehicles Communications. IEEE Commun. Surv. Tutor. 2019, 21, 3386–3416. [Google Scholar] [CrossRef]
Mozaffari, M.; Saad, W.; Bennis, M.; Nam, Y.H.; Debbah, M. A Tutorial on UAVs for Wireless Networks: Applications, Challenges, and Open Problems. IEEE Commun. Surv. Tutor. 2019, 21, 2334–2360. [Google Scholar] [CrossRef] [Green Version]
Auer, P.; Cesa-Bianchi, N.; Fischer, P. Finite-time Analysis of the Multiarmed Bandit Problem. Mach. Learn. 2004, 47, 235–256. [Google Scholar] [CrossRef]
Audibert, J.Y.; Munos, R.; Szepesvari, C. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theor. Comput. Sci. 2009, 410, 1876–1902. [Google Scholar] [CrossRef]
Yang, F.; Wang, J.B.; Zhang, H.; Lin, M.; Cheng, J. Intelligent Reflecting Surface Assisted mmWave Communication Using Mixed Timescale Channel State Information. IEEE Trans. Wirel. Commun. 2022, 21, 5673–5687. [Google Scholar] [CrossRef]
Chen, Y.; Wang, Y.; Jiao, L. Robust Transmission for Reconfigurable Intelligent Surface Aided Millimeter Wave Vehicular Communications With Statistical CSI. IEEE Trans. Wirel. Commun. 2022, 21, 928–944. [Google Scholar] [CrossRef]
Li, L.; Ma, D.; Ren, H.; Wang, D.; Tang, X.; Liang, W.; Bai, T. Enhanced reconfigurable intelligent surface assisted mmWave communication: A federated learning approach. China Commun. 2020, 17, 115–128. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, S.; Gao, F.; Tang, J.; Dobre, O.A. Cascaded Channel Estimation for RIS Assisted mmWave MIMO Transmissions. IEEE Wirel. Commun. Lett. 2021, 10, 2065–2069. [Google Scholar] [CrossRef]
He, J.; Wymeersch, H.; Juntti, M. Channel Estimation for RIS-Aided mmWave MIMO Systems via Atomic Norm Minimization. IEEE Trans. Wirel. Commun. 2021, 20, 5786–5797. [Google Scholar] [CrossRef]
Pradhan, C.; Li, A.; Song, L.; Vucetic, B.; Li, Y. Hybrid Precoding Design for Reconfigurable Intelligent Surface Aided mmWave Communication Systems. IEEE Wirel. Commun. Lett. 2020, 9, 1041–1045. [Google Scholar] [CrossRef] [Green Version]
Taha, A.; Alrabeiah, M.; Alkhateeb, A. Enabling Large Intelligent Surfaces With Compressive Sensing and Deep Learning. IEEE Access 2021, 9, 44304–44321. [Google Scholar] [CrossRef]
Jia, C.; Gao, H.; Chen, N.; He, Y. Machine learning empowered beam management for intelligent reflecting surface assisted MmWave networks. China Commun. 2020, 17, 100–114. [Google Scholar] [CrossRef]
Zhao, D.; Lu, H.; Wang, Y.; Sun, H.; Gui, Y. Joint Power Allocation and User Association Optimization for IRS-Assisted mmWave Systems. IEEE Trans. Wirel. Commun. 2022, 21, 577–590. [Google Scholar] [CrossRef]
Wang, W.; Zhang, W. Joint Beam Training and Positioning for Intelligent Reflecting Surfaces Assisted Millimeter Wave Communications. IEEE Trans. Wirel. Commun. 2021, 20, 6282–6297. [Google Scholar] [CrossRef]
Mohamed, E.M.; Hashima, S.; Hatano, K. Energy Aware Multiarmed Bandit for Millimeter Wave-Based UAV Mounted RIS Networks. IEEE Wirel. Commun. Lett. 2022, 11, 1293–1297. [Google Scholar] [CrossRef]
Zhang, Q.; Saad, W.; Bennis, M. Reflections in the Sky: Millimeter Wave Communication with UAV-Carried Intelligent Reflectors. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar] [CrossRef] [Green Version]
Zhang, Q.; Saad, W.; Bennis, M. Distributional Reinforcement Learning for mmWave Communications with Intelligent Reflectors on a UAV. In Proceedings of the GLOBECOM 2020—2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
Guo, X.; Chen, Y.; Wang, Y. Learning-Based Robust and Secure Transmission for Reconfigurable Intelligent Surface Aided Millimeter Wave UAV Communications. IEEE Wirel. Commun. Lett. 2021, 10, 1795–1799. [Google Scholar] [CrossRef]
Jiang, L.; Jafarkhani, H. Reconfigurable Intelligent Surface Assisted mmWave UAV Wireless Cellular Networks. In Proceedings of the ICC 2021—IEEE International Conference on Communications, Montreal, QC, Canada, 14–18 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
Xiong, B.; Zhang, Z.; Jiang, H.; Zhang, J.; Wu, L.; Dang, J. A 3D Non-Stationary MIMO Channel Model for Reconfigurable Intelligent Surface Auxiliary UAV-to-Ground mmWave Communications. IEEE Trans. Wirel. Commun. 2022, 21, 5658–5672. [Google Scholar] [CrossRef]
Mohamed, E.M.; Hashima, S.; Hatano, K.; Aldossari, S.A.; Zareei, M.; Rihan, M. Two-Hop Relay Probing in WiGig Device-to-Device Networks Using Sleeping Contextual Bandits. IEEE Wirel. Commun. Lett. 2021, 10, 1581–1585. [Google Scholar] [CrossRef]
Ntontin, K.; Boulogeorgos, A.A.A.; Selimis, D.G.; Lazarakis, F.I.; Alexiou, A.; Chatzinotas, S. Reconfigurable Intelligent Surface Optimal Placement in Millimeter-Wave Networks. IEEE Open J. Commun. Soc. 2021, 2, 704–718. [Google Scholar] [CrossRef]
Sinha, D.; Abinav Sankararaman, K.; Kazerouni, A.; Avadhanula, V. Multi-Armed Bandits with Cost Subsidy. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, Virtual, 13–15 April 2021; Volume 130, pp. 3016–3024. [Google Scholar]
Francisco-Valencia, I.; Marcial-Romero, J.R.; Valdovinos, R.M. A comparison between UCB and UCB-Tuned as selection policies in GGP. J. Intell. Fuzzy Syst. 2019, 36, 5073–5079. [Google Scholar] [CrossRef]
Mohamed, E.M.; Hashima, S.; Hatano, K.; Fouda, M.M. Cost-Effective MAB Approaches for Reconfigurable Intelligent Surface Aided Millimeter Wave Relaying. IEEE Access 2022, 10, 81642–81653. [Google Scholar] [CrossRef]

Figure 1. Proposed system model of multi mmWave UAV mounted RIS hotspot area coverage.

Figure 2. Schematic diagram of the mmWave BS, UAV mounted RIS, UE communication links.

Figure 3. Average sum rate against the value of

ρ

.

Figure 3. Average sum rate against the value of

ρ

.

Figure 4. Average energy consumption against the value of

ρ

.

Figure 4. Average energy consumption against the value of

ρ

.

Figure 5. Average sum rate against number of UAVs.

Figure 6. Average energy efficiency against number of UAVs.

Figure 7. Average sum rate against

P_{t}

.

Figure 7. Average sum rate against

P_{t}

.

Figure 8. Average energy efficiency in bps/J against

P_{t}

.

Figure 8. Average energy efficiency in bps/J against

P_{t}

.

Figure 9. Average sum rate convergence against the time horizon using

M = 100

, and

N = 20

.

Figure 9. Average sum rate convergence against the time horizon using

M = 100

, and

N = 20

.

Figure 10. Average sum rate convergence against the time horizon using

M = 100

, and

N = 50

.

Figure 10. Average sum rate convergence against the time horizon using

M = 100

, and

N = 50

.

Table 1. Literature review comparison in RIS assisted mmWave UAV communications.

Reference	Objective	Single/Multi-UAV	Fixed/Mounted
Mohamed, E. M. et al. 2022 [31]	Optimizing the trajectory of UAV mounted RIS	Single	Mounted
Zhang, Q. et al. 2019 [32]	Optimizing the performance of UAV mounted RIS	Single	Mounted
Zhang, Q. et al. 2019 [33]	Optimize the precoding matrix at the BS, the PSs at the RIS	Single	Mounted
Guo, X. et al. 2019 [34]	Enhance the secrecy rate of the mmWave UAV communication.	Single	Fixed
Jiang, L. et al. 2019 [35]	Multiple RIS boards were used to aid UAV-enabled mmWave cellular communications	Single	Fixed
Xiong, B. et al. 2019 [36]	An RIS board was used as an auxiliary to enhance the performance of UAV-enabled mmWave communications	Single	Fixed

Table 2. Simulation Parameters.

Parameter	Value
$P_{t}$ , $P_{f}$ , $P_{h}$	1, 4, 2 Watts [31]
$V_{f}$	5 Km/h [31]
W	2.16 GHz [41]
$λ$	0.005 [41]
$Γ$	0.9 [38]
M	100
$Q_{n}$	Uniformly random in the range [32, 512]
$d_{0}$	5 m [41]
$α_{L o S}, α_{N L o S}, α$	2.2, 3.88, 2 [41]
$δ_{L o S}, δ_{N L o S}$	10.3, 14.6 [41]
$θ_{- 3 d B}, ϕ_{- 3 d B}$	30 $^{\circ}$
$ρ$	$0.6$
$T_{H}$	1000
$σ_{0} (d B m)$	$- 174 + 10 l o g 10 (W) + 10$ [31]
$R_{k_{i}}$	Uniformly random in the range [10, 70] Gbit [31]
$τ$	${(T_{H} / M)}^{2 / 3}$ [39]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mohamed, E.M.; Alnakhli, M.; Hashima, S.; Abdel-Nasser, M. Distribution of Multi MmWave UAV Mounted RIS Using Budget Constraint Multi-Player MAB. Electronics 2023, 12, 12. https://doi.org/10.3390/electronics12010012

AMA Style

Mohamed EM, Alnakhli M, Hashima S, Abdel-Nasser M. Distribution of Multi MmWave UAV Mounted RIS Using Budget Constraint Multi-Player MAB. Electronics. 2023; 12(1):12. https://doi.org/10.3390/electronics12010012

Chicago/Turabian Style

Mohamed, Ehab Mahmoud, Mohammad Alnakhli, Sherief Hashima, and Mohamed Abdel-Nasser. 2023. "Distribution of Multi MmWave UAV Mounted RIS Using Budget Constraint Multi-Player MAB" Electronics 12, no. 1: 12. https://doi.org/10.3390/electronics12010012

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Distribution of Multi MmWave UAV Mounted RIS Using Budget Constraint Multi-Player MAB

Abstract

1. Introduction

2. Related Works

3. System Model and Optimization Problem Formulation

3.1. Proposed System Model

3.2. MmWave Channel Models

3.3. Optimization Problem Formulation of UAV-Hotspot Distribution

4. Proposed BCMP-MAB Algorithm

4.1. MAB Concept

4.2. UAV-Hotspot Distribution Optimization Problem Reformulation

4.3. Proposed BCMP-MAB Algorithm

5. Numerical Analysis

5.1. Adjusting the Value of $ρ$

5.2. Performance against Number of UAVs

5.3. Performance against TX Power

5.4. Convergence Analysis

5.5. Computational Complexity Comparisons

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Distribution of Multi MmWave UAV Mounted RIS Using Budget Constraint Multi-Player MAB

Abstract

1. Introduction

2. Related Works

3. System Model and Optimization Problem Formulation

3.1. Proposed System Model

3.2. MmWave Channel Models

3.3. Optimization Problem Formulation of UAV-Hotspot Distribution

4. Proposed BCMP-MAB Algorithm

4.1. MAB Concept

4.2. UAV-Hotspot Distribution Optimization Problem Reformulation

4.3. Proposed BCMP-MAB Algorithm

5. Numerical Analysis

5.1. Adjusting the Value of ρ

5.2. Performance against Number of UAVs

5.3. Performance against TX Power

5.4. Convergence Analysis

5.5. Computational Complexity Comparisons

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

5.1. Adjusting the Value of $ρ$