Joint Data Transmission and Energy Harvesting for MISO Downlink Transmission Coordination in Wireless IoT Networks
Abstract
:1. Introduction
1.1. Related Work
1.2. The Motivations and Characteristics of This Work
 We derive a multiobjective optimization (MOO) formulation to obtain the optimal BP and PR for the MISO downlink SWIPTenabled wireless networks under the logarithmic nonliner EH model. Then, with a weighted sum approach, we transform this formulation to obtain an objective function for the resulting multipleratio FP problem.
 To solve the nonconvex FP problem, instead of using the Dinkelbach’s transformation that is usually considered, we develop an evolutionary algorithm (EA)aided quadratic transform technique that can obtain the desired PR with EA first, and then feed it to an effective iterative algorithm for nearoptimal solutions.
 To further reduce the computational complexity while avoiding the collection of global channel state information (CSI), we propose a distributed multiagent learningbased approach that requires only partial observations of CSI. Specifically, we develop a multiagent double DQN (DDQN) algorithm for each BS to decide its BP and PR based only on local observations with lower overheads of communication and computation.
 Instead of centralized operations, such as centralized training centralized executing (CTCE) and centralized training distributed executing (CTDE), we adopt a distributed training distributed executing (DTDE) scheme, which makes the offline training and online decision making performed by each single agent or BS distributive and independent and limits the amount of information to be exchanged between neighboring BSs.
 We verify the tradeoff between SE and EH with simulations and show that our proposal can outperform the stateoftheart centralized learningbased algorithm, Advantage Actor Critic (A2C), and baseline approaches, such as greedy and random algorithms. More specifically, it can be seen that, in addition to the introduced FP algorithm to provide superior solutions, the proposed DDQN algorithm can also show its performance gain in terms of utility up to 1.23, 1.87, and 3.45times larger than the A2C, greedy, and random algorithms, respectively, in comparison.
2. System Model and Problem Formulation
2.1. Network and Channel Models
2.2. MultiObjective Optimization
2.3. Problem Formulation
3. Fractional ProgrammingBased Approach
Algorithm 1 EAaided FP algorithm. 

4. Limited Channel Information Exchange
5. LearningBased Approach
5.1. Overview of DDQN
5.2. Distributed MultiAgent DDQN Algorithm
 (1)
 Action: In this algorithm, each action of agent k or ${a}_{k}$ is composed of BP $\left\{{\mathit{\omega}}_{k}\right\}$ and PR $\left\{{\theta}_{k}\right\}$. As the action space of valuebased DRL algorithm must be finite, the feasible actions should be taken from a set of discrete values of $\left\{{\mathit{\omega}}_{k}\right\}$ and $\left\{{\theta}_{k}\right\}$, respectively. Here, as each BP is a complex vector, it should be discretized with real values. To this end, it is first decomposed into two parts as$${\mathit{\omega}}_{k}=\sqrt{{P}_{k}}{\overline{\mathit{\omega}}}_{k}$$On the other hand, ${\overline{\mathit{\omega}}}_{k}$ could be discretized by using a codebook $\mathcal{C}=\left\{{c}_{0},\dots ,{c}_{{N}_{code}1}\right\}$ composed of ${N}_{code}$ code vectors ${\mathbf{c}}_{k}\in {\mathbb{C}}^{{N}_{t}\times 1}$, each specifying a beam direction in $[0,2\pi )$. Providing a sufficient number of code ${N}_{code}\ge {N}_{t}$ to be adopted and a number of S available phase values for each antenna element, we can consider a codebook matrix $\mathbf{C}$ similar to that in [47]. Specifically, for the ${n}_{t}$th antenna element in the qth code, its value can be given by$$\mathbf{C}[{n}_{t},q]=\frac{exp(j\frac{2\pi}{S}\lfloor \frac{{n}_{t}mod\phantom{\rule{0.277778em}{0ex}}(q+\frac{{N}_{code}}{2},{N}_{code})}{{N}_{node}/S}\rfloor )}{\sqrt{{N}_{t}}}$$Apart from BP, we can similarly discretize each PR ${\theta}_{k}$ into ${N}_{eh}$ levels with a set $\mathcal{E}=\left\{0,\frac{1}{{N}_{eh}1},\frac{2}{{N}_{eh}1},\dots ,1\right\}$, representing its values to be selected. Finally, by taking all the discretevalue sets into account, we have the action space for each agent as$$\mathcal{A}=\left\{(p,c,e)p\in \mathcal{P},c\in \mathcal{C},e\in \mathcal{E}\right\}$$
 (2)
 Reward: Apart from the above to select PR within $[0,1]$ from $\mathcal{E}$ to comply with the feasible PR constraint, for the MOO problem, which is also required to meet the transmit power constraint, we conduct a dual form of this optimization by conceptually lifting the power constraint as the penalty term added in the objective to represent a reward to be obtained by the distributed multiagent DDQN algorithm. Specifically, the reward function is denoted by$$r=W\frac{{C}^{d}(\mathrm{\Omega},\theta )}{\overline{{C}^{d}}}+(1W)\frac{{E}^{h}(\mathrm{\Omega},\theta )}{\overline{{E}^{h}}}{W}_{c}{P}_{sum}$$
 (3)
 State: Conventionally, a state in MDP for RLbased algorithms is designed to represent the environmental information perceived by an agent. Given the same aim to represent as much available information as possible in the environment, the different problems involved, however, could realize their state spaces differently in the different related works, such as [39,40,48]. Here, to construct a state for this algorithm, an agent or BS k at time t will provide its local information about the direct link k at the previous time slot $t1$ to its interferers $j\in {I}_{k}\left(t\right),\forall j$, including (1) the interference power received from j, ${\mathit{h}}_{k,j}^{\u2020}(t1){\mathit{\omega}}_{j}{(t1)}^{2}$; (2) the interferenceplusnoise power, ${\sum}_{l\ne k}{\left{\mathit{h}}_{k,l}^{\u2020}(t1){\mathit{\omega}}_{l}(t1)\right}^{2}+{\sigma}^{2}$; (3) the achievable data rate, ${C}_{k}^{d}(t1)$; and (4) the channel gain, ${\mathit{h}}_{k,k}^{\u2020}\left(t\right){\overline{\mathit{\omega}}}_{k}(t1)$. At the same time, it will also send the information to its interfered neighbors $i\in {O}_{k}\left(t\right),\forall i$, including the index ${\ell}_{k}(t1)$ for the beam direction ${\overline{\mathit{\omega}}}_{k}(t1)$ adopted and the achievable data rate ${C}_{k}^{d}(t1)$.
 the normalized identity of BS, $k/{N}_{b}^{l}$;
 the normalized channel gain, $\left(\right{\mathit{h}}_{k,k}^{\u2020}\left(t\right){\overline{\mathit{\omega}}}_{k}(t1){}^{2})/{N}_{c}^{l}$;
 the normalized interferenceplusnoise power,$({\sum}_{l\ne k}{\mathit{h}}_{k,l}^{\u2020}\left(t\right){\mathit{\omega}}_{l}(t1){}^{2}+{\sigma}^{2})/{N}_{i}^{l}$;
 the normalized reward, $(W\frac{{C}_{k}^{d}(t1)}{\overline{{C}^{d}}}+(1W)\frac{{E}_{k}^{h}(t1)}{\overline{{E}^{h}}}{W}_{c}{P}_{sum}(t1))/{N}_{r}^{l}$,
 the normalized identity of the interferer BS, $j/{N}_{b}^{i}$;
 the normalized beam direction index adopted by the interferer BS, ${\ell}_{j}(t1)/{N}_{i}^{i}$;
 the normalized interference power, $\left(\right{\mathit{h}}_{k,j}^{\u2020}(t1){\mathit{\omega}}_{j}(t1){}^{2})/{N}_{c}^{i}$;
 the normalized utility, $(W\frac{{C}_{j}^{d}(t1)}{\overline{{C}^{d}}}+(1W)\frac{{E}_{j}^{h}(t1)}{\overline{{E}^{h}}})/{N}_{u}^{i}$,
 the normalized channel gain, $\left(\right{\mathit{h}}_{i,i}^{\u2020}(t1){\overline{\mathit{\omega}}}_{i}(t1){}^{2})/{N}_{c}^{n}$;
 the normalized utility, $(W\frac{{C}_{i}^{d}(t1)}{\overline{{C}^{d}}}+(1W)\frac{{E}_{i}^{h}(t1)}{\overline{{E}^{h}}})/{N}_{u}^{n}$;
 the normalized SINR with respect to k,$\frac{{\mathit{h}}_{i,k}^{\u2020}(t1){\mathit{\omega}}_{k}{(t1)}^{2}}{{\sum}_{l\ne i}{\left{\mathit{h}}_{i,l}^{\u2020}(t1){\mathit{\omega}}_{l}(t1)\right}^{2}+{\sigma}^{2}}/{N}_{s}^{n}$;
 the normalized totallyreceived power,$({\sum}_{\forall l}{\left{\mathit{h}}_{i,l}^{\u2020}(t1){\mathit{\omega}}_{l}(t1)\right}^{2}+{\sigma}^{2})/{N}_{e}^{n}$,
 (4)
 Selection policy and experience replay: Apart from MDP, the DDQN algorithm also adopts the same mechanisms usually found in DQN, such as $\u03f5$greedy selection policy and experience replay. First, by using the $\u03f5$greedy selection policy, each agent can explore the environment with the probability $\u03f5$ and can exploit with the probability $1\u03f5$, where $\u03f5$ is a hyperparameter for the tradeoff between exploration and exploitation and decays with a rate of ${\lambda}_{\u03f5}$ to its minimum value ${\u03f5}_{min}$, similar to that in [51]. Further, by means of experience replay, each agent k can store its transactions $({\mathbf{s}}_{k}\left(t\right),{\mathbf{a}}_{k}\left(t\right),{r}_{k}\left(t\right),{\mathbf{s}}_{k}^{\prime})$ in a buffer memory ${D}_{k}$, and then randomly sample ${D}_{k}$ to construct a minibatch for training its DNNs through, e.g., a stochastic gradient descent (SGD) algorithm to update the weights ${\varphi}_{1}$ and ${\varphi}_{2}$ for ${Q}_{train}$ and ${Q}_{target}$, respectively. As a summary, the proposed multiagent DDQN algorithm is is shown in Algorithm 2 for reference.
Algorithm 2 Multiagent DDQN algorithm. 

6. Numerical Experiments
6.1. Simulation Setup
6.2. Parametric Analysis
6.2.1. The Number of Power Levels
6.2.2. The Number of Beam Directions
6.2.3. The Number of Power Splitting Ratios (PR)
6.3. Performance Comparison
 Global state informationbased scheme: In principle, this scheme is the same as the distributed multiagent DDQN algorithm. However, instead of adopting its own state ${s}_{k}$ only, each agent k adopts the full state information, i.e., $\left\{{\mathit{s}}_{1},{\mathit{s}}_{2},\dots ,{\mathit{s}}_{L}\right\}$ for its own DDQN operations, based on the concept of centralized training distributed executing (CTDE). Clearly, collecting such information would require a centralized processor or a full information exchange mechanism to exist in the network and, thus, is denoted as “gloDDQN” as noted at the beginning of this section.
 Singleagent DRL scheme: As a branch of machine learning, DRL is conventionally developed with a single agent operated centrally in a processor. Here, the stateoftheart RL algorithm, Advantage Actor Critic, is adopted as a centralized DRLbased benchmark scheme for resolving the MOO problem and is simply denoted as “A2C”.
 Randombased scheme: As a baseline algorithm, the scheme leads each agent to randomly choose an action in each time slot and is denoted here as “random”.
 Greedybased scheme: As another baseline algorithm, each agent in this scheme adopts the beam direction with the maximum channel gain and the maximum transmit power while randomly selecting its PR from the set of ${N}_{eh}$ elements for the DDQN. For easy reference, this scheme is denoted as “greedy” in the sequel.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Notations
References
 Ni, W.; Zheng, J.; Tian, H. Semifederated learning for collaborative intelligence in massive IoT networks. IEEE Internet Things J. 2023. [Google Scholar] [CrossRef]
 Wang, W.; Chen, J.; Jiao, Y.; Kang, J.; Dai, W.; Xu, Y. Connectivityaware contract for incentivizing IoT devices in complex wireless blockchain. IEEE Internet Things J. 2023. [Google Scholar] [CrossRef]
 Irmer, R.; Droste, H.; Marsch, P.; Grieger, M.; Fettweis, G.; Brueck, S.; Mayer, H.; Thiele, L.; Jungnickel, V. Coordinated multipoint: Concepts, performance, and field trial results. IEEE Commun. Mag. 2011, 49, 102–111. [Google Scholar] [CrossRef][Green Version]
 RashidFarrokhi, F.; Liu, K.J.R.; Tassiulas, L. Transmit beamforming and power control for cellular wireless systems. IEEE J. Sel. Areas Commun. 1998, 16, 1437–1450. [Google Scholar] [CrossRef][Green Version]
 3GPP TR36.814. Evolved Universal Terrestrial Radio Access (EUTRA); Further Advancements for EUTRA Physical Layer Aspects. Available online: https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=2493 (accessed on 22 December 2022).
 López, O.L.A.; Alves, H.; Souza, R.D.; MontejoSánchez, S.; Fernández, E.M.G.; LatvaAho, M. Massive wireless energy transfer: Enabling sustainable IoT toward 6G era. IEEE Internet Things J. 2021, 8, 8816–8835. [Google Scholar] [CrossRef]
 Ku, M.L.; Li, W.; Chen, Y.; Liu, K.J.R. Advances in energy harvesting communications: Past, present, and future challenges. IEEE Commun. Surv. Tutor. 2016, 18, 1384–1412. [Google Scholar] [CrossRef]
 Clerckx, B.; Zhang, R.; Schober, R.; Ng, D.W.K.; Kim, D.I.; Poor, H.V. Fundamentals of wireless information and power transfer: From RF energy harvester models to signal and system designs. IEEE J. Sel. Areas Commun. 2019, 37, 4–33. [Google Scholar] [CrossRef][Green Version]
 Zhang, R.; Ho, C.K. MIMO broadcasting for simultaneous wireless information and power transfer. IEEE Trans. Wirel. Commun. 2013, 12, 1989–2001. [Google Scholar] [CrossRef][Green Version]
 Shen, C.; Li, W.C.; Chang, T.H. Wireless information and energy transfer in multiantenna interference channel. IEEE Trans. Signal Process. 2014, 62, 6249–6264. [Google Scholar] [CrossRef][Green Version]
 Zhou, X.; Zhang, R.; Ho, C.K. Wireless information and power transfer: Architecture design and rateenergy tradeoff. IEEE Trans. Commun. 2013, 61, 4754–4767. [Google Scholar] [CrossRef][Green Version]
 Kumar, D.; López, O.L.A.; Tölli, A.; Joshi, S. Latencyaware joint transmit beamforming and receive power splitting for SWIPT systems. In Proceedings of the 2021 IEEE 32nd Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), Helsinki, Finland, 13–16 September 2021; pp. 490–494. [Google Scholar]
 Oshaghi, M.; Emadi, M.J. Throughput maximization of a hybrid EHSWIPT relay system under temperature constraints. IEEE Trans. Veh. Technol. 2020, 69, 1792–1801. [Google Scholar] [CrossRef]
 Xu, J.; Zhang, R. Throughput optimal policies for energy harvesting wireless transmitters with nonideal circuit power. IEEE J. Sel. Areas Commun. 2014, 32, 322–332. [Google Scholar]
 Ng, D.W.K.; Lo, E.S.; Schober, R. Wireless information and power transfer: Energy efficiency optimization in OFDMA systems. IEEE Trans. Wirel. Commun. 2013, 12, 6352–6370. [Google Scholar] [CrossRef][Green Version]
 Shi, Q.; Peng, C.; Xu, W.; Hong, M.; Cai, Y. Energy efficiency optimization for MISO SWIPT systems with zeroforcing beamforming. IEEE Trans. Signal Process. 2016, 64, 842–854. [Google Scholar] [CrossRef]
 Vu, Q.D.; Tran, L.N.; Farrell, R.; Hong, E.K. An efficiency maximization design for SWIPT. IEEE Signal Process. Lett. 2015, 22, 2189–2193. [Google Scholar] [CrossRef][Green Version]
 Yu, H.; Zhang, Y.; Guo, S.; Yang, Y.; Ji, L. Energy efficiency maximization for WSNs with simultaneous wireless information and power transfer. Sensors 2017, 17, 1906. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Wang, X.; Liu, J.; Zhai, C. Wireless power transferbased multipair twoway relaying with massive antennas. IEEE Trans. Wirel. Commun. 2017, 16, 7672–7684. [Google Scholar] [CrossRef]
 Wang, X.; Ashikhmin, A.; Wang, X. Wirelessly powered cellfree IoT: Analysis and optimization. IEEE Internet Things J. 2020, 7, 8384–8396. [Google Scholar] [CrossRef]
 Lu, X.; Wang, P.; Niyato, D.; Kim, D.I.; Han, Z. Wireless networks with RF energy harvesting: A contemporary survey. IEEE Commun. Surv. Tutor. 2015, 17, 757–789. [Google Scholar] [CrossRef][Green Version]
 Huda, S.M.A.; Arafat, M.Y.; Moh, S. Wireless power transfer in wirelessly powered sensor networks: A review of recent progress. Sensors 2022, 22, 2952. [Google Scholar] [CrossRef] [PubMed]
 LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
 Park, J.J.; Moon, J.H.; Lee, K.; Kim, D.I. Transmitteroriented dualmode SWIPT with deeplearningbased adaptive mode switching for iot sensor networks. IEEE Internet Things J. 2020, 7, 8979–8992. [Google Scholar] [CrossRef]
 Han, E.J.; Sengly, M.; Lee, J.R. Balancing fairness and energy efficiency in SWIPTbased D2D networks: Deep reinforcement learning based approach. IEEE Access 2022, 10, 64495–64503. [Google Scholar] [CrossRef]
 Muy, S.; Ron, D.; Lee, J.R. Energy efficiency optimization for SWIPTbased D2Dunderlaid cellular networks using multiagent deep reinforcement learning. IEEE Syst. J. 2022, 16, 3130–3138. [Google Scholar] [CrossRef]
 AlEryani, Y.; Akrout, M.; Hossain, E. Antenna clustering for simultaneous wireless information and power transfer in a MIMO fullduplex system: A deep reinforcement learningbased design. IEEE Trans. Commun. 2021, 69, 2331–2345. [Google Scholar] [CrossRef]
 Zhang, R.; Xiong, K.; Lu, Y.; Gao, B.; Fan, P.; Letaief, K.B. Joint coordinated beamforming and power splitting ratio optimization in MUMISO SWIPTenabled hetnets: A multiagent DDQNbased approach. IEEE J. Sel. Areas Commun. 2022, 40, 677–693. [Google Scholar] [CrossRef]
 Sengly, M.; Lee, K.; Lee, J.R. Joint optimization of spectral efficiency and energy harvesting in D2D networks using deep neural network. IEEE Trans. Veh. Technol. 2021, 70, 8361–8366. [Google Scholar] [CrossRef]
 Han, J.; Lee, G.H.; Park, S.; Choi, J.K. Joint subcarrier and transmission power allocation in OFDMAbased WPT system for mobileedge computing in iot environment. IEEE Internet Things J. 2022, 9, 15039–15052. [Google Scholar] [CrossRef]
 Han, J.; Lee, G.H.; Park, S.; Choi, J.K. Joint orthogonal band and power allocation for energy fairness in WPT system with nonlinear logarithmic energy harvesting model. arXiv 2020, arXiv:2003.13255. [Google Scholar]
 Huang, J.; Xing, C.C.; Guizani, M. Power allocation for D2D communications with SWIPT. IEEE Trans. Wirel. Commun. 2020, 19, 2308–2320. [Google Scholar] [CrossRef]
 Lu, W.; Liu, G.; Si, P.; Zhang, G.; Li, B.; Peng, H. Joint resource optimization in simultaneous wireless information and power transfer (SWIPT) enabled multirelay internet of things (IoT) system. Sensors 2019, 19, 2536. [Google Scholar] [CrossRef][Green Version]
 Lee, K. Distributed transmit power control for energyefficient wirelesspowered secure communications. Sensors 2021, 21, 5861. [Google Scholar] [CrossRef] [PubMed]
 Ehrgott, M. Multicriteria Optimization; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2005; Volume 491. [Google Scholar]
 Dinkelbach, W. On nonlinear fractional programming. Manag. Sci. 1967, 13, 492–498. [Google Scholar] [CrossRef]
 Shen, K.; Yu, W. Fractional programming for communication systemspart I: Power control and beamforming. IEEE Trans. Signal Process. 2018, 66, 2616–2630. [Google Scholar] [CrossRef][Green Version]
 Radaideh, M.I.; Du, K.; Seurin, P.; Seyler, D.; Gu, X.; Wang, H.; Shirvan, K. Neorl: Neuroevolution optimization with reinforcement learning. arXiv 2021, arXiv:2112.07057. [Google Scholar]
 Nasir, Y.S.; Guo, D. Multiagent deep reinforcement learning for dynamic power allocation in wireless networks. IEEE J. Sel. Areas Commun. 2019, 37, 2239–2250. [Google Scholar] [CrossRef][Green Version]
 Ge, J.; Liang, Y.C.; Joung, J.; Sun, S. Deep reinforcement learning for distributed dynamic MISO downlinkbeamforming coordination. IEEE Trans. Commun. 2020, 68, 6070–6085. [Google Scholar] [CrossRef]
 Bertsekas, D.P. Dynamic Programming and Optimal Control; Athena Scientific: Nashua, NH, USA, 1995; Volume 1. [Google Scholar]
 Tiong, T.; Saad, I.; Teo, K.T.K.; Lago, H.b. Deep reinforcement learning with robust deep deterministic policy gradient. In Proceedings of the 2020 Second International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), London, UK, 31 August–3 September 2020; pp. 1–5. [Google Scholar]
 Fujimoto, S.; Hoof, H.V.; Meger, D. Addressing function approximation error in actorcritic methods. In Proceedings of the 35th International Conference on Machine Learning, Helsinki, Finland, 13–16 September 2018; pp. 2587–2601. [Google Scholar]
 Hasselt, H.V.; Guez, A.; Silver, D. Deep reinforcement learning with double Qlearning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
 Ren, J.; Wang, H.; Hou, T.; Zheng, S.; Tang, C. Collaborative edge computing and caching with deep reinforcement learning decision agents. IEEE Access 2020, 8, 120604–120612. [Google Scholar] [CrossRef]
 Nan, Z.; Jia, Y.; Ren, Z.; Chen, Z.; Liang, L. Delayaware content delivery with deep reinforcement learning in internet of vehicles. IEEE Trans. Intell. Transp. Syst. 2022, 23, 8918–8929. [Google Scholar] [CrossRef]
 Zou, W.; Cui, Z.; Li, B.; Zhou, Z.; Hu, Y. Beamforming codebook design and performance evaluation for 60 GHz wireless communication. In Proceedings of the 2011 11th International Symposium on Communications and Information Technologies (ISCIT), Hangzhou, China, 12–14 October 2011; pp. 30–35. [Google Scholar]
 Simsek, M.; Bennis, M.; Guvenc, I. Learning based frequency and timedomain intercell interference coordination in HetNets. IEEE Trans. Veh. Technol. 2015, 64, 4589–4602. [Google Scholar] [CrossRef][Green Version]
 Qiu, C.; Hu, Y.; Chen, Y.; Zeng, B. Deep deterministic policy gradient (DDPG)based energy harvesting wireless communications. IEEE Internet Things J. 2019, 6, 8577–8588. [Google Scholar] [CrossRef]
 Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 7–9 July 2015; Bach, F., Blei, D., Eds.; PMLR: Cambridge, MA, USA, 2015; pp. 448–456. [Google Scholar]
 Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
 Canese, L.; Cardarilli, G.C.; Nunzio, L.D.; Fazzolari, R.; Giardino, D.; Re, M.; Spano, S. Multiagent reinforcement learning: A review of challenges and applications. Appl. Sci. 2021, 11, 4948. [Google Scholar] [CrossRef]
 Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
Parameter  Value 

Number of neighboring cells (U)  5 
Noise power (${\sigma}^{2}$)  −114 dBm 
Standard deviation  8 dB 
Number of multipaths  4 
Time slot duration  20 ms 
Angular spread  3° 
Channel correlation coefficient  0.64 
Cell radius  20 m 
Maximum transmit power (${P}_{max}$)  38 dBm 
Minimum transmit power (${P}_{min}$)  0 
Number of transmit antennas in BS (${N}_{t}$)  4 
Number of transmit power levels (${N}_{p}$)  4, 8, 16 
Number of energy harvesting ratios (${N}_{eh}$)  4, 8, 16 
Number of beam directions (${N}_{code}$)  4, 8, 16 
Parameter  Value 

Learning rate  0.0005 
Greedy exploration parameter ($\u03f5$)  0.2 
Exploration decay rate (${\lambda}_{\u03f5}$)  0.0001 
Minimum exploration rate (${\u03f5}_{min}$)  0.01 
Greedy decay rate  0.0001 
Size of state for angent/BS k (${\mathit{s}}_{k}$)  44 
Size of action for angent/BS k (${\mathit{a}}_{k}$)  64 
Replay buffer size for angent/BS k (${\mathit{D}}_{k}$)  500 
Batch size for angent/BS k  32 
Normalization factors for local BS (${N}_{b}^{l},{N}_{c}^{l},{N}_{i}^{l},{N}_{r}^{l}$)  $(1,{10}^{4},{10}^{4},1)$ 
Normalization factors for interferer BS (${N}_{b}^{i},{N}_{i}^{i},{N}_{c}^{i},{N}_{u}^{i}$)  $(18,1,{10}^{4},10)$ 
Normalization factors for interfered BS (${N}_{c}^{n},{N}_{u}^{n},{N}_{s}^{n},{N}_{e}^{n}$)  $({10}^{4},1,10,{10}^{2})$ 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.S.; Lin, C.H.; Hu, Y.C.; Donta, P.K. Joint Data Transmission and Energy Harvesting for MISO Downlink Transmission Coordination in Wireless IoT Networks. Sensors 2023, 23, 3900. https://doi.org/10.3390/s23083900
Liu JS, Lin CH, Hu YC, Donta PK. Joint Data Transmission and Energy Harvesting for MISO Downlink Transmission Coordination in Wireless IoT Networks. Sensors. 2023; 23(8):3900. https://doi.org/10.3390/s23083900
Chicago/Turabian StyleLiu, JainShing, ChunHung Lin, YuChen Hu, and Praveen Kumar Donta. 2023. "Joint Data Transmission and Energy Harvesting for MISO Downlink Transmission Coordination in Wireless IoT Networks" Sensors 23, no. 8: 3900. https://doi.org/10.3390/s23083900