Joint Task Offloading and Resource Allocation for Intelligent Reflecting SurfaceAided Integrated Sensing and Communication Systems Using Deep Reinforcement Learning Algorithm
Abstract
:1. Introduction
1.1. Related Works
1.2. Contributions
 We propose the IRSassisted ISAC framework, exploiting the IRS to assist and enhance sensing and communication functions in NLoS coverage areas. We construct a comprehensive optimization goal, covering the sensing, communication, and computation offloading. The main goal is to maximize the data sumrate while minimizing energy consumption under the radar performance, transmit power budget, and offloading time delay constraints through the joint design of transmit beamforming and IRS phase shift.
 Considering the coupled relationship between optimization variables, the joint optimization problem is NPhard and nonconvex, making it challenging to use traditional mathematical methods. Therefore, the optimization problem is formulated as an MDP problem, and two innovative DRL schemes are designed to solve it. Due to the continuous and largedimension action space, we develop a deep deterministic policy gradient (DDPG) scheme, which combines prior experience replay technology to enhance training efficiency. Furthermore, a twin delayed DDPG (TD3) scheme is designed based on the DDPG framework.
 Simulation results confirm the effectiveness and convergence of our proposed scheme. In contrast with benchmarks, our proposed DRL scheme achieves a better balance between communication and sensing performance. Moreover, system’s energy consumption and latency are optimized by proper computation offloading. Finally, the benefits and feasibility of the IRSassisted ISAC framework are verified.
2. System Model
2.1. Communication Model
2.2. Radar Sensing Model
2.3. Computation Offloading Model
3. Problem Formulation
3.1. Transmission Performance Optimization
3.2. System Energy Consumption Optimization
3.3. SystemComprehensive Performance Optimization
4. DRLBased Joint Task Offloading and Resource Allocation Scheme
4.1. MDP Formulation
 ${\overline{\mathbf{H}}}_{1}\left(t\right)=\left[\mathrm{Re}\left\{{\mathbf{H}}_{1}\left(t\right)\right\},\mathrm{Im}\left\{{\mathbf{H}}_{1}\left(t\right)\right\}\right]$: the channel matrix ${\mathbf{H}}_{1}\left(t\right)$ is divided into the real part and imaginary part, due to the fact that the neural network cannot deal with the complex value.
 ${\overline{\mathbf{H}}}_{2}\left(t\right)=\left[\mathrm{Re}\left\{{\mathbf{H}}_{2}\left(t\right)\right\},\mathrm{Im}\left\{{\mathbf{H}}_{2}\left(t\right)\right\}\right]$: as the same way, ${\mathbf{H}}_{2}\left(t\right)$ is separated into two independent parts, and ${\mathbf{H}}_{2}\left(t\right)=\left\{{\mathbf{h}}_{k,2}\left(t\right)\rightk\in \mathcal{K}\}$.
 $\mathbf{p}\left(t\right)=\left\{\left[\mathrm{Re}\left\{{p}_{k}\left(t\right)\right\},\mathrm{Im}\left\{{p}_{k}\left(t\right)\right\}\right]\mid \forall k\in \mathcal{K}\right\}$: the transmit power for each UE and divided into two ports inputting the training network with ${p}_{k}\left(t\right)=\mathrm{Tr}\left({\mathbf{w}}_{k}{\mathbf{w}}_{k}^{H}\right)$.
 $\mathbf{d}\left(t\right)=\left[{d}_{k}\left(t\right)\mid \forall k\in \mathcal{K}\right]$: the size of the computation task generated at UE.
 $\mathbf{a}(t1)$: denotes the action selected by the agent at the previous time step.
4.2. An Improved DDPGBased Joint Optimization Algorithm
Algorithm 1 PER DDPGbased Joint Task Offloading and Resource Allocation Algorithm. 

4.3. Twin Delayed DDPG (TD3)Based Joint Optimization Algorithm
 In the Input Step, input two pairs of critic networks ${Q}_{1}\left(s,a{\omega}^{{Q}_{1}}\right)$ and ${Q}_{2}\left(s,a{\omega}^{{Q}_{2}}\right)$, respectively. In Step 1, initialize parameters of two estimate critics and two target critics with ${\omega}^{{Q}_{1}}$, ${\omega}^{{Q}_{2}}$, ${\omega}^{{Q}_{1}^{\prime}}$, and ${\omega}^{{Q}_{2}^{\prime}}$.
 Before turning to Step 17, the agent adopts a delayed update strategy to keep policy networks updated less frequently than value networks.
5. Numerical Results
5.1. Convergence Performance
5.2. Performance Comparison
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
 ITUR WP5D. Draft New Recommendation ITUR M. [IMT. Framework for 2030 and Beyond]–Framework and Overall Objectives of the Future Development of IMT for 2030 and Beyond. 2023. Available online: https://www.itu.int/md/R19WP5D230612TD0905/ (accessed on 20 September 2023).
 Mishra, K.V.; Shankar, M.B.; Koivunen, V.; Ottersten, B.; Vorobyov, S.A. Toward millimeterwave joint radar communications: A signal processing perspective. IEEE Signal Process. Mag. 2019, 36, 100–114. [Google Scholar] [CrossRef]
 Kumari, P.; Vorobyov, S.A.; Heath, R.W. Adaptive virtual waveform design for millimeterwave joint communication–Radar. IEEE Trans. Signal Process. 2019, 68, 715–730. [Google Scholar] [CrossRef]
 Dokhanchi, S.H.; Mysore, B.S.; Mishra, K.V.; Ottersten, B. A mmWave automotive joint radarcommunications system. IEEE Trans Aerosp. Electron. Syst. 2019, 55, 1241–1260. [Google Scholar] [CrossRef]
 Zhang, Q.; Sun, H.; Gao, X.; Wang, X.; Feng, Z. TimeDivision ISAC Enabled Connected Automated Vehicles Cooperation Algorithm Design and Performance Evaluation. IEEE J. Sel. Areas Commun. 2022, 40, 2206–2218. [Google Scholar] [CrossRef]
 Liu, X.; Zhang, H.; Long, K.; Zhou, M.; Li, Y.; Poor, H.V. Proximal Policy OptimizationBased Transmit Beamforming and PhaseShift Design in an IRSAided ISAC System for the THz Band. IEEE J. Sel. Areas Commun. 2022, 40, 2056–2069. [Google Scholar] [CrossRef]
 Solomitckii, D.; Heino, M.; Buddappagari, S.; Hein, M.A.; Valkama, M. Radar scheme with raised reflector for NLOS vehicle detection. IEEE Trans. Intell. Transp. Syst. 2021, 23, 9037–9045. [Google Scholar] [CrossRef]
 Song, X.; Zhao, D.; Hua, H.; Han, T.X.; Yang, X.; Xu, J. Joint transmit and reflective beamforming for IRSassisted integrated sensing and communication. In Proceedings of the 2022 IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022; pp. 189–194. [Google Scholar]
 Liu, F.; Cui, Y.; Masouros, C.; Xu, J.; Han, T.X.; Eldar, Y.C.; Buzzi, S. Integrated sensing and communications: Toward dualfunctional wireless networks for 6G and beyond. IEEE J. Sel. Areas Commun. 2022, 40, 1728–1767. [Google Scholar] [CrossRef]
 Rajatheva, N.; Atzeni, I.; Björnson, E.; Bourdoux, A.; Buzzi, S.; Doré, J.B.; Erkucuk, S.; Fuentes, M.; Guan, K.; Hu, Y.; et al. White paper on broadband connectivity in 6G. 2020. Available online: http://urn.fi/urn:isbn:9789526226798 (accessed on 2 October 2023).
 Shao, X.; You, C.; Ma, W.; Chen, X.; Zhang, R. Target sensing with intelligent reflecting surface: Architecture and performance. IEEE J. Sel. Areas Commun. 2022, 40, 2070–2084. [Google Scholar] [CrossRef]
 Liu, X.; Huang, T.; Shlezinger, N.; Liu, Y.; Zhou, J.; Eldar, Y.C. Joint transmit beamforming for multiuser MIMO communications and MIMO radar. IEEE Trans. Signal Process. 2020, 68, 3929–3944. [Google Scholar] [CrossRef]
 Jiang, Z.M.; Rihan, M.; Zhang, P.; Huang, L.; Deng, Q.; Zhang, J.; Mohamed, E.M. Intelligent Reflecting Surface Aided DualFunction Radar and Communication System. IEEE Syst. J. 2022, 16, 475–486. [Google Scholar] [CrossRef]
 Chu, Z.; Xiao, P.; Shojafar, M.; Mi, D.; Mao, J.; Hao, W. Intelligent Reflecting Surface Assisted Mobile Edge Computing for Internet of Things. IEEE Wirel. Commun. Lett. 2021, 10, 619–623. [Google Scholar] [CrossRef]
 Sankar, R.P.; Chepuri, S.P. Beamforming in Hybrid RIS assisted Integrated Sensing and Communication Systems. In Proceedings of the 2022 30th European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 29 August–2 September 2022; pp. 1082–1086. [Google Scholar] [CrossRef]
 Buzzi, S.; Grossi, E.; Lops, M.; Venturino, L. Foundations of MIMO Radar Detection Aided by Reconfigurable Intelligent Surfaces. IEEE Trans. Signal Process. 2022, 70, 1749–1763. [Google Scholar] [CrossRef]
 Hua, M.; Wu, Q.; He, C.; Ma, S.; Chen, W. Joint Active and Passive Beamforming Design for IRSAided RadarCommunication. IEEE Trans. Wirel. Commun. 2023, 22, 2278–2294. [Google Scholar] [CrossRef]
 He, Y.; Cai, Y.; Mao, H.; Yu, G. RISAssisted Communication Radar Coexistence: Joint Beamforming Design and Analysis. IEEE J. Sel. Areas Commun. 2022, 40, 2131–2145. [Google Scholar] [CrossRef]
 Wang, X.; Fei, Z.; Huang, J.; Yu, H. Joint Waveform and Discrete Phase Shift Design for RISAssisted Integrated Sensing and Communication System Under CramerRao Bound Constraint. IEEE Trans. Veh. Technol. 2022, 71, 1004–1009. [Google Scholar] [CrossRef]
 Liu, R.; Li, M.; Liu, Y.; Wu, Q.; Liu, Q. Joint Transmit Waveform and Passive Beamforming Design for RISAided DFRC Systems. IEEE J. Sel. Top. Signal Process. 2022, 16, 995–1010. [Google Scholar] [CrossRef]
 Liao, C.; Wang, F.; Lau, V.K.N. Optimized Design for IRSAssisted Integrated Sensing and Communication Systems in Clutter Environments. IEEE Trans. Commun. 2023, 71, 4721–4734. [Google Scholar] [CrossRef]
 Huang, N.; Wang, T.; Wu, Y.; Wu, Q.; Quek, T.Q.S. Integrated Sensing and Communication Assisted Mobile Edge Computing: An EnergyEfficient Design via Intelligent Reflecting Surface. IEEE Wirel. Commun. Lett. 2022, 11, 2085–2089. [Google Scholar] [CrossRef]
 Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
 LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
 FrançoisLavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. Found. Trends Mach. Learn. 2018, 11, 219–354. [Google Scholar] [CrossRef]
 Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Humanlevel control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
 Chen, J.; Xing, H.; Xiao, Z.; Xu, L.; Tao, T. A DRL Agent for Jointly Optimizing Computation Offloading and Resource Allocation in MEC. IEEE Internet Things J. 2021, 8, 17508–17524. [Google Scholar] [CrossRef]
 Meng, F.; Chen, P.; Wu, L.; Cheng, J. Power Allocation in MultiUser Cellular Networks: Deep Reinforcement Learning Approaches. IEEE Trans. Wirel. Commun. 2020, 19, 6255–6267. [Google Scholar] [CrossRef]
 Cheng, M.; Li, J.; Nazarian, S. DRLcloud: Deep reinforcement learningbased resource provisioning and task scheduling for cloud service providers. In Proceedings of the 2018 23rd Asia and South Pacific Design Automation Conference (ASPDAC), Jeju, Korea, 22–25 January 2018; pp. 129–134. [Google Scholar] [CrossRef]
 Huang, C.; Mo, R.; Yuen, C. Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement Learning. IEEE J. Sel. Areas Commun. 2020, 38, 1839–1850. [Google Scholar] [CrossRef]
 PereiraRuisánchez, D.; Fresnedo, Ó.; PérezAdán, D.; Castedo, L. Joint Optimization of IRSassisted MUMIMO Communication Systems through a DRLbased Twin Delayed DDPG Approach. In Proceedings of the 2022 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Bilbao, Spain, 15–17 June 2022; pp. 1–6. [Google Scholar] [CrossRef]
 You, C.; Zhang, R. Wireless Communication Aided by Intelligent Reflecting Surface: Active or Passive? IEEE Wirel. Commun. Lett. 2021, 10, 2659–2663. [Google Scholar] [CrossRef]
 Xu, S.; Du, Y.; Zhang, J.; Liu, J.; Wang, J.; Zhang, J. Intelligent Reflecting Surface Enabled Integrated Sensing, Communication and Computation. IEEE Trans. Wirel. Commun. 2023. early access. [Google Scholar] [CrossRef]
 Dinh, T.Q.; Tang, J.; La, Q.D.; Quek, T.Q.S. Offloading in Mobile Edge Computing: Task Allocation and Computational Frequency Scaling. IEEE Trans. Commun. 2017, 65, 3571–3584. [Google Scholar] [CrossRef]
 Wang, C.; Liang, C.; Yu, F.R.; Chen, Q.; Tang, L. Computation Offloading and Resource Allocation in Wireless Cellular Networks With Mobile Edge Computing. IEEE Trans. Wirel. Commun. 2017, 16, 4924–4938. [Google Scholar] [CrossRef]
 Mao, Y.; Zhang, J.; Song, S.H.; Letaief, K.B. Stochastic Joint Radio and Computational Resource Management for MultiUser MobileEdge Computing Systems. IEEE Trans. Wirel. Commun. 2017, 16, 5994–6009. [Google Scholar] [CrossRef]
 Zhou, F.; Wu, Y.; Hu, R.Q.; Qian, Y. Computation Rate Maximization in UAVEnabled WirelessPowered MobileEdge Computing Systems. IEEE J. Sel. Areas Commun. 2018, 36, 1927–1941. [Google Scholar] [CrossRef]
 Feriani, A.; Hossain, E. Single and multiagent deep reinforcement learning for AIenabled wireless networks: A tutorial. IEEE Commun. Surv. Tutor. 2021, 23, 1226–1252. [Google Scholar] [CrossRef]
 Hou, Y.; Liu, L.; Wei, Q.; Xu, X.; Chen, C. A novel DDPG method with prioritized experience replay. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017. [Google Scholar]
 Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
 Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actorcritic methods. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
 Zhang, H.; Di, B.; Song, L.; Han, Z. Reconfigurable Intelligent Surfaces Assisted Communications With Limited Phase Shifts: How Many Phase Shifts Are Enough? IEEE Trans. Veh. Technol. 2020, 69, 4498–4502. [Google Scholar] [CrossRef]
 Study on Channel Model for Frequencies from 0.5 to 100 GHz (Release 17). Document 3GPP TR 38.901. v17.0.0. 2022. Available online: https://www.3gpp.org/DynaReport/38901.htm (accessed on 10 September 2023).
 Basar, E.; Yildirim, I. Reconfigurable Intelligent Surfaces for Future Wireless Networks: A Channel Modeling Perspective. IEEE Wirel. Commun. 2021, 28, 108–114. [Google Scholar] [CrossRef]
 Wang, Z.; Wei, Y.; Yu, F.R.; Han, Z. Utility Optimization for Resource Allocation in MultiAccess Edge Network Slicing: A TwinActor Deep Deterministic Policy Gradient Approach. IEEE Trans. Wirel. Commun. 2022, 21, 5842–5856. [Google Scholar] [CrossRef]
Ref.  Phases  Users  Targets  Radar Paths  Method 

[13]  Continuous  Single  Single  LoS, NLoS  MM 
[8]  Continuous  Single  Multiple  NLoS  SDR 
[15]  Continuous  Multiple  Multiple  LoS  AO 
[18]  Continuous  Single  Single  LoS, NLoS  PDD, BCD 
[19]  Discrete  Multiple  Multiple  LoS  AO 
[20]  Continuous  Multiple  Single  LoS, NLoS  ADMM, AO 
[21]  Discrete  Multiple  Multiple  NLoS  SDR 
[22]  Continuous  Single  Multiple  LoS, NLoS  BCD 
This paper  Continuous  Multiple  Multiple  NLoS  DRL 
Parameter  Description  Value 

M  Number of antennas at BS  8 
$N\times N$  Number of IRS elements  64 
K  Number of UEs  8 
${P}^{max}$  Power budget of BS  10 dB 
${p}_{k}$  Transmit power of the UE  30 dBm 
${\sigma}^{2}$  Noise variance  −85 dBm 
${B}_{k}$  Bandwidth allocated to UE k  2 MHz 
${d}_{k}$  Input data size of task  $U[1,2]$ Mbits 
${c}_{k}$  Required computation cost  $U[1,2]$ Kcycles/bit 
${F}_{o}^{\mathrm{tol}}$  CPU frequency of BS server  10 Gcycles/s 
${f}_{l,k}$  CPU frequency of UE  $U[1,2]$ Gcycles/s 
${\xi}_{k}$  Maximum tolerable latency  100 ms 
${\kappa}_{o},{\kappa}_{l}$  Effective capacitance coefficient  ${10}^{26}$, $3\times {10}^{26}$ 
${\alpha}_{\mu},{\alpha}_{Q}$  Learning rate for actor and critic networks  0.001, 0.001 
$\gamma $  Discount factor  0.7 
$\u03f5$  Soft update factor  0.01 
$\mathcal{M}$  Capacity of experience buffer  10,000 
J  Capacity of minibatch  16 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, L.; Wei, Y.; Wang, X. Joint Task Offloading and Resource Allocation for Intelligent Reflecting SurfaceAided Integrated Sensing and Communication Systems Using Deep Reinforcement Learning Algorithm. Sensors 2023, 23, 9896. https://doi.org/10.3390/s23249896
Yang L, Wei Y, Wang X. Joint Task Offloading and Resource Allocation for Intelligent Reflecting SurfaceAided Integrated Sensing and Communication Systems Using Deep Reinforcement Learning Algorithm. Sensors. 2023; 23(24):9896. https://doi.org/10.3390/s23249896
Chicago/Turabian StyleYang, Liu, Yifei Wei, and Xiaojun Wang. 2023. "Joint Task Offloading and Resource Allocation for Intelligent Reflecting SurfaceAided Integrated Sensing and Communication Systems Using Deep Reinforcement Learning Algorithm" Sensors 23, no. 24: 9896. https://doi.org/10.3390/s23249896