An On-Demand Partial Charging Algorithm without Explicit Charging Request for WRSNs

Gao, Weixin; Li, Yuxiang; Shao, Tianyi; Lin, Feng

doi:10.3390/electronics12204343

Open AccessArticle

An On-Demand Partial Charging Algorithm without Explicit Charging Request for WRSNs

¹

College of Computer Science, Sichuan University, Chengdu 610065, China

²

College of Communication Engineering (College of Microelectronics), Chengdu University of Information Technology, Chengdu 610225, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(20), 4343; https://doi.org/10.3390/electronics12204343

Submission received: 19 September 2023 / Revised: 2 October 2023 / Accepted: 17 October 2023 / Published: 19 October 2023

(This article belongs to the Section Networks)

Download

Browse Figures

Versions Notes

Abstract

:

Wireless rechargeable sensor networks provide an effective solution to the energy limitation problem in wireless sensor networks by introducing chargers to recharge the nodes. On-demand charging algorithms, which schedule the mobile charger to charge the most energy-scarce node based on the node’s energy status, are one of the main types of charging scheduling algorithms for wireless rechargeable sensor networks. However, most existing on-demand charging algorithms require a predefined charging request threshold to prompt energy-starved nodes with energy levels lower than this threshold to submit an explicit charging request to the base station so that the base station can schedule the mobile charger to charge these nodes. These algorithms ignore the difference in importance of nodes in the network, and charging requests sent by nodes independently can interfere with the mobile charger’s globally optimal scheduling. In addition, forwarding charging requests in the network increases the network burden. In this work, aiming to maximize the network revenue and the charging efficiency, we investigate the problem of scheduling the mobile charger on-demand without depending on explicit charging requests from nodes (SWECR). We propose a novel on-demand partial charging algorithm that does not require explicit charging requests from nodes. Our algorithm accounts for the differences in importance between nodes and leverages the deep reinforcement learning technique to determine the target charging node and each node’s charging time. The simulation results demonstrate that the proposed algorithm significantly improves the charging performance and maximizes the network revenue and the charging efficiency.

Keywords:

wireless rechargeable sensor networks; on-demand charging; charging scheduling; deep reinforcement learning; partial charging

1. Introduction

With the rapid development of the Internet of Things (IoT), the applications of wireless sensor networks (WSNs) are becoming more and more widespread. They have shown great value in many fields, such as military warfare, medical health, agricultural production, logistics tracking, environmental protection, and industrial monitoring [1,2,3,4]. Typically, a wireless sensor network is composed of a multitude of sensor nodes that are randomly distributed throughout the monitoring area [5]. Due to the size constraints of the sensors, most of them are powered by small-capacity batteries, resulting in limited life cycles for WSNs [6]. Fortunately, the emergence of wireless energy transfer technology [7] has facilitated the provision of stable and continuous energy replenishment for WSNs, which are referred to as wireless rechargeable sensor networks (WRSNs) [8].

The primary research problem of WRSNs is to charge sensors by efficiently scheduling the mobile charger (MC). On-demand charging algorithms are the most common charging scheduling algorithms in WRSNs because they can increase network lifetime and performance. Most existing on-demand charging algorithms adopt a predefined charging request threshold for sensors [9], the sensors send explicit charging requests when their residual energy falls below the threshold, and the base station (BS) schedules the MC to charge these nodes. However, these charging algorithms are severely limited by the predefined charging request threshold and the explicit charging request. Most of these algorithms neglect the varying impact on network performance when different sensors fail [10]. In practice, some nodes are significantly more critical than the ordinary data-gathering nodes; they perform more tasks than other nodes and their energy consumption rate is likewise higher. As a result, these nodes with higher energy consumption rates may run out of energy sooner after submitting their charging requests, affecting the network performance more than ordinary nodes’ energy exhaustion.

On the other hand, subject to the predefined charging request threshold, the charging requests issued by nodes based on their energy levels rather than from the perspective of the whole network may lead to the scheduling of the MC that is not globally optimal. For example, explicit charging requests may cause the MC to waste energy by moving back and forth between multiple sensors. As shown in Figure 1a, the MC charges the nodes that send explicit charging requests, and each node’s remaining lifetime is labeled next to the node. Without loss of generality, we assume that the charging request is sent when each sensor’s residual lifetime is below 60 min, and thus the MC prioritizes charging nodes 1 and 5, which send charging requests, while ignoring nodes 2 and 3, which are closer to the MC and have equally short remaining lifetimes. Meanwhile, since the WRSN is a multi-hop network, the forwarding of charging requests through the network may further increase the energy burden on the network [11]. Furthermore, most of the existing studies utilize the full charging mode, where the sensors are charged to their full battery capacity. This approach is limited by the wireless energy transfer power, resulting in prolonged charging times [12]. In contrast, partial charging allows sensors to receive sufficient energy to operate until the next charging process. The MC can charge more sensors before the deadline, which will greatly reduce the number of dead sensors in the network and increase the network revenue.

To overcome these challenges, we propose a novel on-demand partial charging algorithm. Different from on-demand charging algorithms that use a predefined charging request threshold and full charging, such as the algorithm in [13], our proposed algorithm takes the sensors’ residual lifetime as the charging deadline and utilizes a partial charging mode to schedule the MC to charge sensors based on their different importance. We optimize the MC’s scheduling scheme in terms of the entire network rather than individual nodes themselves by having the BS monitor the network state and deferring the decision of scheduling the MC to perform charging tasks to the BS. As shown in Figure 1b, our algorithm can globally optimize the MC scheduling scheme to avoid inefficient movement of the MC in the network. It should be noted that the dynamic scheduling problem of the MC is limited by the MC’s battery capacity, while the high-energy-consuming nodes in the network may need to be recharged multiple times within a short period, so the algorithms used to solve the traditional traveling salesman problem (TSP) and vehicle routing problem (VRP) do not solve such a problem well. We employ reinforcement learning, a subfield of machine learning where the agent learns optimal behavior by interacting with its environment to achieve its goals. In our case, the agent is the MC, and the environment is the WRSN. The MC continuously optimizes its strategy as it interacts with the environment, ultimately planning an optimal charging tour to maximize the network revenue and the charging efficiency. Specifically, our main contributions can be summarized as follows.

(1): We propose an efficient on-demand partial charging algorithm, in which the charging priority of the nodes is ranked by the base station based on the state information of sensor nodes and the possible impact of their failure on network performance; then, the base station develops a separate charging scheme for each node to maximize the network revenue and the charging efficiency.
(2): To the best of our knowledge, the proposed algorithm is the first on-demand charging algorithm that does not specify a charging request threshold for nodes. As a result, instead of passively waiting for nodes to transmit charging requests, the MC can select any sensor node as the charging target adaptively and dynamically. Furthermore, our proposed approach employs a partial charging mode to provide charging services for sensors, with reinforcement learning used to determine both the target node and the charging amount.
(3): We conduct comprehensive simulation experiments to assess the effectiveness of the proposed algorithm. The simulation results demonstrate that the proposed algorithm outperforms existing algorithms in terms of maximizing the network revenue and enhancing the charging performance.

The rest of this paper is organized as follows. Section 2 provides a brief overview of related work. Section 3 presents the system model, notations, notions, and problem definition. Section 4 describes the proposed algorithm in detail. Section 5 evaluates the proposed algorithm. Section 6 discusses research entry points for future work. Finally, Section 7 concludes the paper.

2. Related Work

In recent years, researchers have investigated the scheduling problem in WRSNs, which can be categorized into two main approaches: periodical charging and on-demand charging. In periodical charging, the BS plans a charging tour for the MC before the charging task. The MC travels along a predetermined path and periodically charges the sensors [14,15]. Meanwhile, in on-demand charging, once the residual energy of a sensor falls below a threshold, the sensor will send a charging request to the BS [16,17]. Then, the BS schedules the MC to provide charging services for the nodes according to some predefined rules.

To balance the charging efficiency and the charging throughput of the MC, Chen et al. [18] proposed an adaptive real-time on-demand charging scheduling scheme, which classifies charging requests into different categories according to their urgency. He et al. [19] proposed a novel tree-based charging scheme, which determines the residual energy threshold for robots to send charging requests through a queue-based approach and then charges the robots according to a tree-based schedule. For the optimization objective of maximizing the energy utilization efficiency and node survival rate, Dong et al. [20] proposed an instant on-demand charging strategy. The strategy combines the temporal, spatial, and event characteristics of the nodes into a charging priority metric, and, based on this metric, it ranks the nodes’ instant charging requests. In [12], Wang et al. formulated the charging optimization problem as a multiple traveling salesman problem with deadlines, and they proposed a heuristic algorithm for dynamic real-time charging. The authors in [9] proposed an on-demand multi-node charging scheme that follows a partial charging mode. The scheme selects the optimal halting points of MCs by integrating a non-dominated sorting genetic algorithm and multi-attribute decision-making approach. Then, it determines the partial charging time for each halting point based on a partial charging timer. Gao et al. [21] decoupled the energy–time joint optimization problem into ERAMCCS-Energy and ERAMCCS-Time subproblems. They found the charging schedule that satisfied the node energy demand with the minimum energy loss and minimum time span. Dudyala et al. [22] proposed an MC scheduling algorithm based on sensor energy rate prediction to maximize the network lifetime. The authors in [23] proposed an energy-efficient tour optimization scheme that uses two MCs to perform the tasks of minimizing the variance of the nodes’ residual lifetime and enhancing the network lifetime, respectively. Lin et al. [24] proposed a temporal–spatial charging scheduling algorithm, which first ranks charging requests in increasing order of deadline and then adjusts the charging order to reduce the number of dead nodes. Das et al. [25] used a multi-objective-based genetic algorithm to optimize the joint data gathering and charging process, and their goal was to reduce the total dead periods of the sensors and the overall data-gathering delay. Cao et al. [13] used the time window to represent the charging demand of sensors and used the charging reward to measure the quality of sensor charging. They scheduled the MC to replenish the energy supply of sensors by maximizing the sum of charging rewards collected by the MC during the charging tour.

Existing studies have achieved some promising results, but these studies overlook the crucial issue of customizing charging schemes for sensor nodes based on their varying importance. All these approaches need to wait for explicit charging requests from the nodes before scheduling the MC for charging. In contrast to previous approaches, we present an innovative on-demand partial charging algorithm that does not require nodes to send charging requests but determines the optimal service time and charging amount for each node based on the importance of the nodes and the network state information. Our proposed algorithm can globally optimize MC scheduling to ensure the energy supply of key nodes and improve the charging efficiency while maximizing the network revenue.

3. Preliminaries

In this section, we first introduce the system model, notations, and notions, and then define the problem precisely. The main notations used in this paper are shown in Table 1.

3.1. System Model

We are considering a wireless sensor network

G_{s} = (V_{s}, E_{s})

deployed in a two-dimensional monitoring area, where

V_{s}

denotes the set of nodes, and

E_{s}

represents the set of wireless links between nodes. All nodes in the network consist of a BS and a set of N randomly deployed sensors, denoted as

V_{s} = \{v_{0}, v_{1}, \dots, v_{N}\}

.

v_{0}

denotes the BS, which is deployed in the center of the monitoring area. Each sensor

v_{i} \in V_{s}

is powered by a rechargeable battery with energy capacity

b_{i}

. The communication radius of both sensors and the BS is r. The Euclidean distance between

v_{i}

and

v_{j}

is denoted as

d_{i, j}

. The sensors or the BS can communicate with one another as long as they are within communication range of one another. The sensors have the function of sensing, receiving, and transmitting data, and the sensing data rate of sensor

v_{i}

at timestamp t is denoted as

d r_{i} (t)

. Assume that a routing protocol exists in

G_{s}

for the transmission of data, which selects the path with the least energy consumption to complete the data delivery in a multi-hop way. We simulate the network data transmission process based on the communication energy consumption model in [26], and we assume that each sensor

v_{i} \in V_{s}

can observe its residual energy

r e_{i} (t)

and the number of transmitted packets

p_{i} (t)

. Using existing prediction techniques, sensor

v_{i}

can estimate its energy consumption rate

ρ_{i} (t)

[27], and its residual lifetime

r l_{i} (t)

can be estimated according to Equation (1).

r l_{i} (t) = \frac{r e_{i} (t)}{ρ_{i} (t)}

(1)

To ensure the long-term operation of the sensor network

G_{s}

, we deploy an MC with energy capacity B at the BS to provide charging services. We assume that the relevant state information of

v_{i}

at timestamp t can transmit to the BS along with the sensing data, including residual energy

r e_{i}

, energy consumption rate

ρ_{i}

, residual lifetime

r l_{i}

, and number of transmitted packets

p_{i}

. We assume that the BS keeps a copy of the state information of all sensors and can obtain the MC’s state information as well as the result of each charging service through remote communication. Unlike algorithms that need to receive charging requests and then schedule the MC, in our algorithm, the BS decides the MC’s departure time and the target sensor to be charged based on the states of all sensors and sends the scheduling scheme to the MC via remote communication.

During a charging cycle, the MC starts from the BS, moves to the vicinity of the target sensor for one-to-one charging, and finally returns to the BS when its energy is low. The sensor

v_{i}

can be partially charged several times during a charging cycle, and the maximum rechargeable amount at each charge is

b_{i} - r e_{i}

. We assume that the MC consumes energy when moving and charging sensors and communicating with the BS, while the energy dissipation during the energy transfer is negligible. The MC’s charging power is

μ

, and it moves at a constant speed v. The energy consumption per unit distance is

ξ

, and the energy consumption of the MC for each instance of remote communication with the BS is

ψ

. After completing the current charging service, the MC will choose to stay in place and wait for the sensors to consume further energy, or move immediately to the next target sensor, depending on the network status. When the MC is low on energy, it returns to the BS for recharging and then proceeds to the next charging cycle when it is fully charged.

3.2. Problem Definition

The death of sensors will cause the loss of network data and reduce the network revenue; we should try to recharge the sensors in time during the residual lifetime

r l_{i}

. However, when the network size becomes large, there may be many life-critical sensors at the same time, and it is difficult for the MC to recharge all the sensors in time. At the same time, more life-critical sensors means that the MC needs to move frequently, so the MC will consume more energy for movement instead of using the energy for sensor energy replenishment. To improve the MC’s charging efficiency, the travel distance of the MC should be minimized while avoiding the loss of network data due to sensor death. We define the charging cycle C of the MC starting from and ending at the BS as follows:

C = \{B S \to (v_{1}^{^{'}}, e_{1}) \to (v_{2}^{^{'}}, e_{2}) \to \dots \to (v_{n^{^{'}}}^{^{'}}, e_{n^{^{'}}}) \to B S\}

(2)

v_{i}^{^{'}}

is the target sensor to be charged, selected by the MC based on the network state, and

e_{i}

is the size of the charging amount for the MC to charge

v_{i}^{^{'}}

. Any sensor can be charged multiple times in a charging cycle for a maximum charging amount of

b_{i} - r e_{i}

per charge. Since the energy of the MC is divided into two main parts, the energy used to move and the energy transmitted to sensors, maximizing the MC’s charging efficiency means minimizing the distance that the MC moves. We define the total distance moved by the MC throughout the monitoring period as

D i s t_{t o t a l}

, which is the sum of the distances traveled by the MC during the execution of the charging tasks. Moreover, we define the total amount of data lost due to MC actions throughout the monitoring period as

D a t a_{l o s s}

, which is the sum of the amount of data lost by nodes in the network that are unable to communicate with the BS (including nodes that are unable to recharge in time to deactivate and nodes that are unable to relay packets through an active node). Although the network revenue depends on many factors, because wireless sensor networks are data-centric networks, this paper focuses on the impact of data loss on the network revenue, and, in order to minimize the data loss, we need to minimize the number of nodes that are unable to communicate with the BS. Therefore, given a set of to-be-charged sensor nodes

v_{i}^{^{'}} \in C

in each charging cycle

C = \{B S \to (v_{1}^{^{'}}, e_{1}) \to (v_{2}^{^{'}}, e_{2}) \to \dots \to (v_{n^{^{'}}}^{^{'}}, e_{n^{^{'}}}) \to B S\}

, we define the problem of scheduling the mobile charger on-demand without depending on explicit charging requests from nodes (SWECR) as follows:

m i n i m i z e D a t a_{l o s s},

(3)

and

m i n i m i z e D i s t_{t o t a l},

(4)

subject to

e_{i} \leq (b_{i} - r e_{i})

(5)

\sum_{v_{i}^{^{'}} \in C} (e t_{i} + e_{i}) + e t_{B S} \leq B

(6)

Our goal is to plan charging paths for the MC in each charging cycle to maximize the network revenue and the charging efficiency by minimizing data loss due to sensor death and MC energy consumption for moving. Constraint (5) ensures that the maximum charging amount does not exceed the current energy demand

b_{i} - r e_{i}

each time that the MC charges the sensor. Constraint (6) ensures that the MC starts from the BS at the beginning of each charging cycle and finally returns to the BS.

e t_{i}

denotes the energy required to move the MC from its current location to near sensor

v_{i}^{^{'}}

, and

e t_{B S}

denotes the energy required to return the MC to the BS.

Theorem 1.

The problem of scheduling the mobile charger on-demand without depending on explicit charging requests from nodes (SWECR) is NP-hard.

Proof.

We can consider a special case of the SWECR problem. In the normal case, we can provide partial charging services to the same node several times during one charging cycle. However, in the special case, the nodes in the network can only be charged once before the end of their lifetime and must be fully charged at once. We assume that there is a set of life-critical sensors in the network

G_{s}

that need to be charged at timestamp t. We assign a latency cost weight

w_{i, j} = t_{j} + t_{i, j}

to each edge in the set

E_{s}

of edges in

G_{s}

, where

t_{j}

denotes the time to charge the sensor to the specified charging level and

t_{i, j}

denotes the time to move the MC from

v_{i}

to

v_{j}

. A fully charged MC will depart from the BS, provide one charging service for each life-critical sensor, and finally return to the BS, and the objective of the MC is to obtain the loop that minimizes the cumulative delay cost weight sum. The special case can be formulated as the vehicle routing problem with deadlines (VRPD), where capacity-constrained vehicles depart from a distribution center and visit a set of customers with different cargo requirements within the deadline, and each customer can only be served once. Therefore, the classical VRPD can be considered a special case of the SWECR problem, where the deadline for a vehicle to serve a customer is extended from 0 to infinity and only one vehicle departs from the BS to serve customers. Since the classical VRPD is NP-complete [28], it can be deduced that the complexity of the SWECR problem is NP-hard. □

4. The Proposed VRPD-RL-PART Algorithm

In this section, we design an efficient on-demand partial charging algorithm (VRPD-RL-PART) to solve the SWECR problem based on the deep reinforcement learning technique. During the operation of the network, the algorithm running at the BS selects the time at which the MC provides charging services as well as the target node to be charged and the charging amount based on the network state information. When each charging service is completed, the MC sends the charging result to the BS via remote communication, and the BS optimizes the neural network parameters based on the reward obtained from the charging service. We train the agent using the Proximal Policy Optimization (PPO) algorithm [29], whose input to the policy network is the state of the sensor network, and the output is the probability of selecting different actions for the MC. In the following, we model the SWECR problem and present the design details of the proposed algorithm and the algorithm training process.

4.1. Problem Modeling

The MC performs an action

a_{t}

in state

s_{t}

and receives a reward

r_{t} = r (s_{t}, a_{t})

from the environment while the environmental state changes to state

s_{t + 1}

. Each step in the charging trajectory

τ

of the MC can be represented by the tuple

(s_{t}, a_{t}, r_{t}, s_{t + 1})

and the charging trajectory

τ

is denoted as

\{s_{0}, a_{0}, s_{1}, a_{1}, \dots, s_{T - 1}, a_{T - 1}, s_{T}\}

, where

s_{0}

is the initial state of the environment and

s_{T}

is the termination state, i.e., the state in which the network operates up to the length of the monitoring period. The cumulative discounted reward

R (τ)

earned by the MC can be calculated using the following equation:

R (τ) = \sum_{t = 0}^{T} γ^{t} r (s_{t}, a_{t})

(7)

γ \in [0, 1]

is the discount factor that affects the importance of future rewards, and a larger

γ

indicates that the agent prefers long-term rewards obtained in the future [13]. The optimization goal of the agent is to maximize the cumulative discounted reward

R (τ)

that it receives during its charging tour, and we will obtain the optimal policy

π^{*}

by maximizing

R (τ)

. The MC selects the action according to the policy

π^{*}

to plan the optimal charging tour to achieve the goal of maximizing the network revenue and the charging efficiency. To obtain the optimal policy

π^{*}

, we define the state space, action space, reward function, and initial and termination states of the environment as follows.

State space: The state space of the model is divided into two parts, the state information of all sensor nodes in the network and the state information of the MC. We define

S t a t e = \{S_{n e t w o r k}, S_{M C}\}

to denote the current state of the environment, where

S_{n e t w o r k} = \{s_{1}, s_{2}, \dots, s_{N}\}

denotes the current state of all sensors and

S_{M C}

denotes the current state of the MC.

s_{i}

denotes the state information of sensor

v_{i}

with six parts, including the horizontal coordinate of

v_{i}

, the vertical coordinate of

v_{i}

, the distance

d_{i, M C}

between

v_{i}

and the MC, the residual energy

r e_{i}

of

v_{i}

, the residual lifetime

r l_{i}

of

v_{i}

, and the percentage of transmitted packets to all packets in the network

p_{i} / N

. For example,

s_{i} = (200.0, 200.0, 300.0, 100.0, 2.0, 30 %)

indicates that the horizontal coordinate of

v_{i}

is

200.0

m, the vertical coordinate is

200.0

m, the distance between

v_{i}

and the MC is

300.0

m, the residual energy of

v_{i}

is

100.0

J, the residual lifetime of

v_{i}

is

2.0

h, and

v_{i}

transmits packets as

30 %

of all packets in the network.

S_{M C}

has three parts, including the horizontal coordinate of the current position of the MC, the vertical coordinate of the MC, and the residual energy of the MC. Thus, the dimension of the state space is

6 \times N + 3

, where N is the number of sensor nodes in the network.

Action space: When the BS selects the next action for the MC, the MC will determine both the node to be visited and the charging amount, so the action space of the MC is a two-dimensional discrete action space

A = \{A_{1}, A_{2}\}

. The MC determines the node to be visited in

A_{1} = \{a_{0}, a_{1}, \dots, a_{N + 1}\}

, where N is the number of sensors, and we use

a c t i o n_{1}

to denote the action selected by the MC in

A_{1}

. When

a c t i o n_{1} = a_{0}

, the MC returns to the BS to fully charge itself in preparation for the next charging cycle. When

a c t i o n_{1} = \{a | a \in {a_{1}, a_{2}, \dots, a_{N}}\}

, the MC visits the sensor

v_{i}

to charge it, and when

a c t i o n_{1} = a_{N + 1}

, the MC stays in place, waiting for further energy consumption by the sensor nodes. Since the sensor has a battery capacity of b, to avoid overcharging, the maximum charging amount

e_{m a x}

cannot exceed b. We set the energy charging unit when performing partial charging to

Δ

, and we discretize the

k (k > 1)

possible charging sizes. The MC determines the partial charging amount in

A_{2} = {Δ, 2 Δ, \dots, k Δ}

, and we use

a c t i o n_{2}

to denote the action selected by the MC in

A_{2}

. The actual charging amount obtained by

v_{i}

is

e_{i} = m i n (a c t i o n_{2}, b_{i} - r e_{i})

, subject to the residual energy and the battery capacity of sensor

v_{i}

. Therefore, the dimension of the action space is

(N + 2, k)

, where N is the number of sensor nodes.

Reward function: The reward function design is the key part of the deep reinforcement learning technique, where the MC receives a scalar reward R from the environment after performing a certain action. This reward value R is used to guide the MC to build a charging tour so that the MC can charge the sensors in time for each charging cycle to avoid data loss due to node death, while maximizing the charging efficiency. Therefore, the reward obtained by the MC after each executed action is negatively related to the amount of the network data loss and the distance traveled by the MC. The reward function is designed as follows:

r (s_{t}, a_{t}) = α \times l n (1 - \frac{δ}{N}) + β \times \frac{e_{j} - Δ}{b - Δ} \times e^{- \frac{d_{t}}{W}}

(8)

r (s_{t}, a_{t})

is the reward value obtained by the MC after performing action

a_{t}

in state

s_{t}

.

α

and

β

are the reward function weights.

δ

is the number of sensors that are unable to transmit packets to the BS, including dead sensors and active sensors that are unable to communicate with the BS via routing. N is the number of all sensors.

e_{j}

is the actual charging amount transmitted by the MC for the sensor

v_{j}

, which can be estimated from the charging model in [23]. To avoid the MC transmitting too little energy when partially charging sensors, we introduce energy charging unit

Δ

to guide the MC to transmit more energy to sensors without affecting the network performance. b is the sensor battery capacity,

d_{t}

is the distance traveled by the MC,

d_{i, M C}

is the distance between the ith sensor and the MC, and W is calculated using Equation (9).

W = \frac{\sum_{i = 1}^{N} d_{i, M C}}{N}

(9)

Initial and termination states: In the initial state, all sensors are able to communicate with the BS through routing relays. At the beginning of each charging cycle, an MC with full energy will depart from the BS. The BS will select actions to be performed by the MC according to the network state to provide charging services, and the MC will return to the BS to charge itself when it is low on energy. We time the network from the start of operation and terminate when the network runs for the set duration of the monitoring period. The workflow of the proposed algorithm is shown in Figure 2.

4.2. The Learning Algorithm

This section details our training process to obtain the optimal policy

π^{*}

according to the PPO algorithm. Compared with other deep reinforcement learning algorithms, the PPO algorithm improves the stability and convergence speed by employing a clipping mechanism to ensure that the update magnitude of each iteration is not too large, which allows the MC to quickly learn how to plan the charging paths to maximize the total charging rewards collected and to achieve the goal of maximizing the network revenue and the charging efficiency. The clipping mechanism used by PPO also significantly reduces the computational complexity, and PPO is proven to be very effective in handling a variety of challenging tasks [30]. The PPO algorithm contains two neural networks: the actor network is used to represent the policy

π_{θ}

, the critic network is used to represent the value function

v_{ψ}

, and

θ

and

ψ

are the parameters of the two neural networks, respectively. The actor network outputs the probability of the MC choosing different actions, and the gradient descent algorithm is used to update the policy with the objective function

J (θ)

, as shown in Equation (10).

J (θ) \approx \sum_{(s_{t}, a_{t})} m i n (\frac{p_{θ} (a_{t} | s_{t})}{p_{θ^{k}} (a_{t} | s_{t})} A^{θ^{k}} (s_{t}, a_{t}), c l i p (\frac{p_{θ} (a_{t} | s_{t})}{p_{θ^{k}} (a_{t} | s_{t})}, 1 - ε, 1 + ε) A^{θ^{k}} (s_{t}, a_{t}))

(10)

To improve the sampling efficiency, the importance sampling technique is introduced in the PPO algorithm, using the behavioral strategy (old strategy) to interact with the environment to sample the training samples and store the collected data in the replay buffer, and then update the target strategy (new strategy) by randomly sampling the training samples after the replay buffer is filled. After multiple updates, the parameters of the target strategy are copied to the behavioral strategy to synchronize the two strategies, and then the replay buffer is emptied and the data are sampled by the behavioral strategy again, and the above steps are repeated until the training is completed. In Equation (10),

θ

is the parameter of the target strategy and

θ^{k}

is the parameter of the behavior strategy. The PPO algorithm uses a CLIP function to limit the difference between the old and new strategies to a certain range, which serves to reduce the fluctuation in strategy training.

p_{θ} (a_{t} | s_{t})

and

p_{θ^{k}} (a_{t} | s_{t})

are the probabilities of taking action

a_{t}

in state

s_{t}

for the old and new strategies, and

A^{θ^{k}} (s_{t}, a_{t})

reflects the degree of superiority of taking action

a_{t}

compared to other actions in state

s_{t}

.

The output of the critic network is an estimate of the state value, which is used to evaluate the goodness of the state. The critic network uses mean square error as the loss function and its goal is to reduce the error between the estimated value of the state and the true value. In summary, the PPO algorithm optimizes the strategy of the agent by maximizing the selection probability of the current state advantage action to maximize the cumulative discounted reward. The pseudo-code of the algorithm training process is shown in Algorithm 1.

Algorithm 1: On-demand partial charging algorithm using PPO

5. Performance Evaluation

In this section, we conduct extensive simulation experiments to assess the performance of our proposed algorithm relative to recent related algorithms. We also further investigate the effect of important parameters on the algorithm by varying some simulation parameters, including the number of sensors, the sensors’ data rate, the charging power, the MC’s energy capacity, and the sensors’ battery capacity, by conducting multiple sets of comparison experiments.

5.1. Simulation Environment

As shown in Table 2, we consider a WSN

G_{s} = (V_{s}, E_{s})

consisting of 100 to 500 sensors; the sensors are randomly deployed in a 1000 m

\times 1000

m square area. The BS is located at (500 m, 500 m) from the center of the square area. The rechargeable battery capacity of each sensor

v_{i} \in V_{s}

is

b = 10.8

kJ [31]. The data rate

d r_{i}

of sensor

v_{i}

is randomly chosen from an interval

[d r_{min}, d r_{max}]

, where

d r_{m i n} = 1

kbps and

d r_{m a x} = 10

kbps [31]. We use a realistic sensor energy consumption model from [26] to simulate the energy consumption of sensor nodes during data transmission. To reduce the effect of randomness in the simulation experiments to accurately compare the algorithm performance, we assume that

d r_{i}

is the average value of the interval and that the proposed algorithm can be easily applied to networks with randomly varying

d r_{i}

. We deploy an MC at the BS to charge the sensors, and the energy capacity of the MC is

B = 2000

kJ. The velocity of the MC is

v = 5

m/s, the moving cost is

ξ = 600

J/m, and the charging power of the MC is

μ = 5

W [27]. To increase the generalizability as well as simplify the calculation, we ignore the energy consumed during the remote communication between the MC and BS in the simulation. The maximum charging amount for the MC to charge sensors is

e_{m a x} = b

, limited by the battery capacity of the sensors. To ensure that the MC transfers as much energy as possible each time that it charges a node, to avoid inefficient and frequent movements, we set the energy charging unit

Δ = 0.2 b

and discretize a total of

k = 5

possible charging sizes, i.e., the action space

A_{2} = \{0.2 b, 0.4 b, 0.6 b, 0.8 b, b\}

. The on-demand charging algorithm proposed in this paper does not require sensors to send charging requests, so no charging request threshold is set and the MC can charge the target sensor at any moment. The monitoring period

T_{M}

of the simulation is one year, and, to ensure a fair comparison, each value in the simulation results is the average of 10 different sets of network topology results with the same simulation parameters. We use a fully connected multi-layer perceptron (MLP) with two hidden layers and the Tanh function as the activation function in both the actor network and the critic network, with each layer of the neural network containing 256 neurons. The simulation experiments are run in the Python 3.8.13 simulator, and the algorithms are also implemented using Pytorch 1.11.0.

5.2. Baseline Setup

To accurately evaluate the performance of the proposed algorithm, we compare the VRPD-RL-PART algorithm proposed in this paper with four existing charging scheduling algorithms, the RMP-RL algorithm [13], the RMP-RL-PART algorithm using partial charging mode improvement, the VRPD-EDF algorithm, and the VRPD-GA algorithm.

RMP-RL: The RMP-RL algorithm models the MC scheduling problem as a reward maximization problem (RMP). RMP-RL uses deep reinforcement learning to select the sensors to be charged. Its optimization goal is to minimize the number of sensor deaths and the distance traveled by the MC.
RMP-RL-PART: The RMP-RL-PART algorithm introduces the partial charging mode to solve the reward maximization problem, using deep reinforcement learning to select the sensors to be charged and the charging amount of partial charging.
VRPD-EDF: In VRPD-EDF, the MC scheduling problem is modeled as a VRPD. Using the Earliest Deadline First (EDF) algorithm to sort the sensors according to the energy depletion deadline, the MC accesses the sensors and fully charges them according to the sorting order.
VRPD-GA: The VRPD-GA algorithm models the MC scheduling problem as a VRPD and uses the genetic algorithm to plan the charging tour of the MC. The MC fully charges the sensors with the primary goal of providing charging services before the deadline of sensor energy depletion and the secondary goal of minimizing the distance traveled.

In this paper, four different performance metrics are used to evaluate the algorithm performance.

Loss Data (MB): The total amount of data lost by the WSN during the monitoring period $T_{M}$ .
Travel Distance (km): The total distance traveled by the MC during the monitoring period $T_{M}$ .
First Sensor Dead Time (hours): The duration of operation from the start of the simulation to the death of the first sensor in the network.
Number of Dead Sensors (times): The total number of dead sensors during the monitoring period $T_{M}$ (dead sensors can be brought back to active after being charged).

5.3. Performance Comparisons

We mainly compare the performance of the proposed algorithm with that of the existing algorithms by changing the number of sensor nodes in the network from 100 to 500, and other simulation parameters are the same as in Table 2. The simulation results are shown in Figure 3. Figure 3a shows the total data loss of these algorithms for the network during the monitoring period. Our proposed algorithm has much less data loss than the other algorithms. Specifically, the VRPD-RL-PART algorithm loses only

19 % \approx 2425

MB/12593 MB of the data of the RMP-RL-PART algorithm and

8 % \approx 2425

MB/29378 MB of the RMP-RL algorithm. This is because our proposed algorithm considers not only the direct data loss due to sensor death but also the impact of the death of critical sensors that forward other nodes’ data on the network performance, and it schedules the MC based on the network state instead of waiting for nodes to send charging requests, which allows for the MC’s globally optimal scheduling. Meanwhile, both the VRPD-GA algorithm and the VRPD-EDF algorithm cause less data loss than the RMP-RL algorithm since these two algorithms also do not require nodes’ explicit charging requests, which reduces the charging waiting time of the lifetime-critical nodes.

Figure 3b shows the total distance traveled by the MC during the monitoring period. The RMP-RL-PART algorithm has the longest travel distance because partial charging causes the MC to move more frequently, but partial charging also significantly reduces network data loss. The travel distance in the VRPD-RL-PART algorithm is shorter than for most other algorithms and is only about 10% longer than that of the VRPD-GA algorithm, which shows that our algorithm can guarantee a shorter MC travel distance while maximizing the network revenue. Figure 3c,d show the change in network runtime for the first sensor death during the monitoring period and the total number of sensor deaths, respectively. As the number of sensors increases, the network runtime at the first node death gradually decreases, while the total number of sensor deaths gradually increases. However, the VRPD-RL-PART algorithm still performs significantly better than other algorithms in terms of the number of sensor deaths. This is because our algorithm does not require nodes to send explicit charging requests; it evaluates the importance of nodes in the network and prioritizes them to ensure the survival of critical nodes.

We compare the performance of the different algorithms by varying the maximum data rate of the sensors from 10 kbps to 20 kbps. Figure 4a,d indicate that as the maximum sensor data rate increases, the network data loss and the number of sensor deaths increase similarly. Figure 4c shows that the network runtime when the first sensor in the network dies decreases as the data rate increases. This is because the higher data rate makes the energy consumption rate of the sensor larger; the sensor will rapidly consume energy, making the residual lifetime of the sensor shorter; more life-critical sensors need to be charged in each charging cycle; the deadline for charging these sensors will overlap; and the possibility of the sensor dying because it cannot be replenished with energy in time is greatly increased. However, our proposed VRPD-RL-PART algorithm has the least network data loss and the algorithm still shows the best performance. This is because our proposed algorithm schedules the MC based on the entire network’s state information, rather than passively waiting for nodes to send charging requests, and it thus can charge nodes promptly. In addition, as the nodes’ data rate increases, the energy consumption rate becomes rapidly larger, and algorithms that use a full charging mode or that require explicit charging requests can result in significant data loss. Figure 4b shows that as the data rate increases, the distance that the MC travels increases; the data rate increases, making the nodes’ energy consumption rate larger; the sensors run out of energy faster; the number of sensors to be charged increases; and the MC needs to move more frequently to meet the charging demands of the sensors.

We next evaluate the algorithm performance by varying the charging power of the MC from 3 W to 7 W. Figure 5a,d indicate that the network data loss and the number of sensor deaths for each algorithm decrease significantly as the charging power of the MC increases. Figure 5c indicates that the network runtime when the first sensor dies for these algorithms also increases significantly with the increase in charging power. This is because, when the charging power increases, the time that it takes for the MC to charge the sensors is significantly reduced and the MC can charge more sensors within the service period. Figure 5b shows that the travel distance of the MC in the VRPD-RL-PART and VRPD-GA algorithms decreases slightly as the charging power increases because the greater charging power gives the MC more time to stay in place and wait for the sensors in the network to consume further energy, which reduces the MC’s travel energy and thus increases the charging efficiency.

We also evaluate the effect of the battery capacity of sensors on algorithm performance by varying the sensor battery capacity from 8640 J to 12,960 J. Figure 6b shows that the travel distance of the MC decreases as the sensor battery capacity increases. This is because the sensor’s larger battery capacity allows it to run longer, the MC’s service deadline is extended, and therefore the MC does not need to move as often to charge the nodes, which allows the MC to travel shorter distances. However, the larger battery capacity also results in a loss of network data, an increase in the number of sensor deaths, and a decrease in the time to the death of the first node in the network, as shown in Figure 6a,c,d. This is because, when the sensor battery capacity increases, the MC needs to transmit more energy to the node when charging a sensor, which makes the charging time longer; life-critical sensors need to wait longer to replenish their energy, so fewer sensors can be charged in time for the service deadline; the first sensor death time in the network is shortened; the number of sensor deaths increases; and the network data loss increases.

We finally evaluate the impact of the energy capacity of the MC on the performance of the algorithms by varying the energy capacity of the MC from 1000 kJ to 2000 kJ. As shown in Figure 7b, when the energy capacity increases, the MC can charge more sensors in one charging cycle, which results in fewer cycles to and from the BS to replenish the energy, and thus the distance traveled by the MC decreases with increasing energy capacity. Meanwhile, since the MC does not need to return to the BS frequently for charging, it can spend more time charging the sensors during the monitoring period, and the network data loss of these algorithms is slightly reduced. However, the VRPD-GA algorithm shows the opposite performance, which is because the VRPD-GA algorithm can only serve each node once in a charging cycle, and a longer charging cycle can cause the energy-intensive nodes to run out of energy and die because they cannot be replenished with energy again in a short time after charging. Moreover, the first node death time in the network with different algorithms is not significantly affected by the variation in the energy capacity of the MC.

In summary, by comparing the algorithms in different scenarios, it can be seen from Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7 that our proposed algorithm outperforms the comparison algorithms in different aspects, such as the number of dead sensors, network data loss, first sensor death time, and MC travel distance.

6. Discussion and Future Work

While this work provides an initial exploration of on-demand scheduling charging approaches in WRSNs that do not use explicit charging requests, the following research entry points merit more investigation, and we intend to target them for our future work.

Charging process optimization. To simplify the problem, most existing studies employing the one-to-one charging paradigm, including this work, assume that the MC must stay near the node to begin charging and then move to the next charging node once charging is complete. Charging, on the other hand, is a continuous operation, and consideration of the charging rate and range during the charging process will optimize the charging algorithm. For example, the MC can start charging a node once it enters the charging range of that node. Consequently, the MC can complete the charging operation while traveling, thus reducing the charging time of the MC and allowing the MC to charge more nodes within a charging cycle.
Joint optimization of data collection and charge scheduling. The scheduling of the MC is determined by the node energy consumption, which is mainly determined by the data collection scheme used in the network. As a result, incorporating data collection schemes into our work can increase the revenue of the network. The data collection scheme can provide some prior knowledge, such as the role of each node and the energy consumption rate of each node. Introducing such prior knowledge can accelerate the training of our algorithm. On the other hand, the data collection scheme can precisely perform energy-saving operations such as active energy management or load balancing based on the charging node sequence determined by the charging algorithm. Furthermore, combining the data collection scheme and the charging algorithm will enhance the communication between nodes, between nodes and the MC, or between nodes and the BS. As a result, the performance of the overall network will be considerably improved by these three different types of devices operating together.
Application-oriented optimization. A WSN is essentially an application-oriented and data-centric network. Different application scenarios have different communication protocols and different constraints. Therefore, incorporating application-specific runtime parameters or application-specific constraints will improve the performance of the charging scheme.
Exact optimization vs. approximate optimization. In ours and related studies, there has been extensive work proving that the MC scheduling problem is an NP problem. Therefore, to further improve the approximation between deep-reinforcement-learning-based approximate optimization and exact optimization, it is important to improve the efficiency of the charging scheduling algorithm.
Multiple MCs. A single MC is used in our simulations to provide charging services in a 1000 m × 1000 m area; however, as the network size increases further, it may be difficult to charge the lifetime-critical nodes promptly using only one MC, which affects the network performance. To address this issue, we consider extending single MC scheduling to the scheduling of multiple MCs for cooperative charging, to satisfy the energy demands of large-scale WRSNs.

7. Conclusions

In this work, we propose a novel on-demand partial charging algorithm designed for wireless rechargeable sensor networks. Instead of relying on a predefined threshold or explicit charging requests from the sensor nodes, the proposed algorithm introduces the deep reinforcement learning technique to determine the charging order and charging duration of the sensor nodes based on network state information and globally optimizes mobile charger scheduling from the standpoint of the entire network. Furthermore, unlike previous studies that require the sensors to be fully charged, our algorithm only partially charges them to reduce waiting times at life-critical nodes. Simulation results demonstrate that our proposed algorithm significantly enhances the charging efficiency of the mobile charger while maximizing the network revenue compared to existing algorithms.

Author Contributions

Conceptualization and methodology, W.G. and F.L.; software and validation, W.G. and T.S.; formal analysis, W.G. and F.L.; investigation, Y.L. and T.S.; writing—original draft preparation, W.G.; writing—review and editing, F.L. and Y.L.; project administration, F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Sichuan Science and Technology Program (grant no. 2022YFQ0035) and the Open Research Fund of Integrated Computing and Chip Security, Sichuan Collaborative Innovation Center of Chengdu University of Information Technology (grant no. CXPAQ202205).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kaswan, A.; Jana, P.K.; Das, S.K. A survey on mobile charging techniques in wireless rechargeable sensor networks. IEEE Commun. Surv. Tutor. 2022, 24, 1750–1779. [Google Scholar] [CrossRef]
Rashid, B.; Rehmani, M.H. Applications of wireless sensor networks for urban areas: A survey. J. Netw. Comput. Appl. 2016, 60, 192–219. [Google Scholar] [CrossRef]
Chen, Y.; Lin, J.; Liao, X. Early detection of tree encroachment in high voltage powerline corridor using growth model and UAV-borne LiDAR. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102740. [Google Scholar] [CrossRef]
Lin, T.L.; Chang, H.Y.; Wang, Y.H. A novel unmanned aerial vehicle charging scheme for wireless rechargeable sensor networks in an Urban bus system. Electronics 2022, 11, 1464. [Google Scholar] [CrossRef]
Mottaghi, S.; Zahabi, M.R. Optimizing LEACH clustering algorithm with mobile sink and rendezvous nodes. AEU-Int. J. Electron. Commun. 2015, 69, 507–514. [Google Scholar] [CrossRef]
Malebary, S. Wireless mobile charger excursion optimization algorithm in wireless rechargeable sensor networks. IEEE Sens. J. 2020, 20, 13842–13848. [Google Scholar] [CrossRef]
Kurs, A.; Karalis, A.; Moffatt, R.; Joannopoulos, J.D.; Fisher, P.; Soljacic, M. Wireless power transfer via strongly coupled magnetic resonances. Science 2007, 317, 83–86. [Google Scholar]
Ouyang, W.; Liu, X.; Obaidat, M.S.; Lin, C.; Zhou, H.; Liu, T.; Hsiao, K.F. Utility-aware charging scheduling for multiple mobile chargers in large-scale wireless rechargeable sensor networks. IEEE Trans. Sustain. Comput. 2020, 6, 679–690. [Google Scholar] [CrossRef]
Priyadarshani, S.; Tomar, A.; Jana, P.K. An efficient partial charging scheme using multiple mobile chargers in wireless rechargeable sensor networks. Ad Hoc Netw. 2021, 113, 102407. [Google Scholar] [CrossRef]
Kan, Y.; Chang, C.Y.; Kuo, C.H.; Roy, D.S. Coverage and connectivity aware energy charging mechanism using mobile charger for WRSNs. IEEE Syst. J. 2021, 16, 3993–4004. [Google Scholar] [CrossRef]
Jothiprakasam, S.; Muthial, C. A method to enhance lifetime in data aggregation for multi-hop wireless sensor networks. AEU-Int. J. Electron. Commun. 2018, 85, 183–191. [Google Scholar] [CrossRef]
Wang, C.; Li, J.; Ye, F.; Yang, Y. NETWRAP: An NDN based real-timewireless recharging framework for wireless sensor networks. IEEE Trans. Mob. Comput. 2014, 13, 1283–1297. [Google Scholar]
Cao, X.; Xu, W.; Liu, X.; Peng, J.; Liu, T. A deep reinforcement learning-based on-demand charging algorithm for wireless rechargeable sensor networks. Ad Hoc Netw. 2021, 110, 102278. [Google Scholar] [CrossRef]
Wei, Z.; Wang, L.; Lyu, Z.; Shi, L.; Li, M.; Wei, X. A multi-objective algorithm for joint energy replenishment and data collection in wireless rechargeable sensor networks. In Proceedings of the Wireless Algorithms, Systems, and Applications: 13th International Conference, WASA 2018, Tianjin, China, 20–22 June 2018; Proceedings 13. Springer: Berlin/Heidelberg, Germany, 2018; pp. 497–508. [Google Scholar]
Boukerche, A.; Wu, Q.; Sun, P. A novel joint optimization method based on mobile data collection for wireless rechargeable sensor networks. IEEE Trans. Green Commun. Netw. 2021, 5, 1610–1622. [Google Scholar] [CrossRef]
Khelladi, L.; Djenouri, D.; Rossi, M.; Badache, N. Efficient on-demand multi-node charging techniques for wireless sensor networks. Comput. Commun. 2017, 101, 44–56. [Google Scholar] [CrossRef]
Kaswan, A.; Tomar, A.; Jana, P.K. An efficient scheduling scheme for mobile charger in on-demand wireless rechargeable sensor networks. J. Netw. Comput. Appl. 2018, 114, 123–134. [Google Scholar] [CrossRef]
Chen, Z.; Shen, H.; Wang, T.; Zhao, X. An adaptive on-demand charging scheme for rechargeable wireless sensor networks. Concurr. Comput. Pract. Exp. 2022, 34, e6136. [Google Scholar] [CrossRef]
He, L.; Cheng, P.; Gu, Y.; Pan, J.; Zhu, T.; Liu, C. Mobile-to-mobile energy replenishment in mission-critical robotic sensor networks. In Proceedings of the IEEE INFOCOM 2014—IEEE Conference on Computer Communications, Toronto, ON, Canada, 27 April–2 May 2014; pp. 1195–1203. [Google Scholar]
Dong, Y.; Bao, G.; Liu, Y.; Wei, M.; Huo, Y.; Lou, Z.; Wang, Y.; Wang, C. Instant on-demand charging strategy with multiple chargers in wireless rechargeable sensor networks. Ad Hoc Netw. 2022, 136, 102964. [Google Scholar] [CrossRef]
Gao, Z.; Chen, Y.; Fan, L.; Wang, H.; Huang, S.C.H.; Wu, H.C. Joint Energy Loss and Time Span Minimization for Energy-Redistribution-Assisted Charging of WRSNs With a Mobile Charger. IEEE Internet Things J. 2022, 10, 4636–4651. [Google Scholar] [CrossRef]
Dudyala, A.K.; Ram, L.K. Improving the Lifetime of Wireless Rechargeable Sensors Using Mobile Charger in On-Demand Charging Environment Based on Energy Consumption Rate Prediction. In Proceedings of the 1st International Conference on Computational Electronics for Wireless Communications: ICCWC 2021, Haryana, India, 11–12 June 2021; pp. 679–688. [Google Scholar]
Gharaei, N.; Al-Otaibi, Y.D.; Butt, S.A.; Malebary, S.J.; Rahim, S.; Sahar, G. Energy-efficient tour optimization of wireless mobile chargers for rechargeable sensor networks. IEEE Syst. J. 2020, 15, 27–36. [Google Scholar] [CrossRef]
Lin, C.; Zhou, J.; Guo, C.; Song, H.; Wu, G.; Obaidat, M.S. TSCA: A temporal-spatial real-time charging scheduling algorithm for on-demand architecture in wireless rechargeable sensor networks. IEEE Trans. Mob. Comput. 2017, 17, 211–224. [Google Scholar] [CrossRef]
Das, R.; Dash, D. Joint on-demand data gathering and recharging by multiple mobile vehicles in delay sensitive WRSN using variable length GA. Comput. Commun. 2023, 204, 130–146. [Google Scholar] [CrossRef]
Xu, W.; Liang, W.; Lin, X.; Mao, G. Efficient scheduling of multiple mobile chargers for wireless sensor networks. IEEE Trans. Veh. Technol. 2015, 65, 7670–7683. [Google Scholar] [CrossRef]
Xu, W.; Liang, W.; Jia, X.; Xu, Z.; Li, Z.; Liu, Y. Maximizing sensor lifetime with the minimal service cost of a mobile charger in wireless sensor networks. IEEE Trans. Mob. Comput. 2018, 17, 2564–2577. [Google Scholar] [CrossRef]
Thangiah, S.R.; Osman, I.H.; Vinayagamoorthy, R.; Sun, T. Algorithms for the vehicle routing problems with time deadlines. Am. J. Math. Manag. Sci. 1993, 13, 323–355. [Google Scholar] [CrossRef]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Wang, Y.; He, H.; Tan, X.; Gan, Y. Trust region-guided proximal policy optimization. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Shi, Y.; Xie, L.; Hou, Y.T.; Sherali, H.D. On renewable sensor networks with wireless energy transfer. In Proceedings of the 2011 IEEE INFOCOM, Shanghai, China, 10–15 April 2011; pp. 1350–1358. [Google Scholar]

Figure 1. On-demand charging problem scenario. (a) Using a predefined request threshold (below 60 min) and charging only sensors that send explicit charging requests, and (b) without an explicit charging request.

Figure 2. Workflow of the proposed algorithm.

Figure 3. Performance of five algorithms by varying the number of sensors N from 100 to 500.

Figure 4. Performance of five algorithms by varying the maximum data rate of sensors

d r_{m a x}

from 10 kbps to 20 kbps while

N = 300

.

Figure 4. Performance of five algorithms by varying the maximum data rate of sensors

d r_{m a x}

from 10 kbps to 20 kbps while

N = 300

.

Figure 5. Performance of five algorithms by varying the charging power of MC

μ

from 3 W to 7 W while

N = 300

.

Figure 5. Performance of five algorithms by varying the charging power of MC

μ

from 3 W to 7 W while

N = 300

.

Figure 6. Performance of the five algorithms by varying the battery capacity of sensors b from 8640 J to

12,960

J while

d r_{m i n} = 1

kbps,

d r_{m a x} = 16

kbps, and

N = 300

.

Figure 6. Performance of the five algorithms by varying the battery capacity of sensors b from 8640 J to

12,960

J while

d r_{m i n} = 1

kbps,

d r_{m a x} = 16

kbps, and

N = 300

.

Figure 7. Performance of the five algorithms by varying the energy capacity of MC B from 1000 kJ to 2000 kJ while

d r_{m i n} = 1

kbps,

d r_{m a x} = 18

kbps, and

N = 300

.

Figure 7. Performance of the five algorithms by varying the energy capacity of MC B from 1000 kJ to 2000 kJ while

d r_{m i n} = 1

kbps,

d r_{m a x} = 18

kbps, and

N = 300

.

Table 1. List of notations.

Notation	Description
N	Total number of sensors
b	Battery capacity of sensors
$d_{i, j}$	The Euclidean distance between node $v_{i}$ and $v_{j}$
$d r_{i}$	Data rate of sensor $v_{i}$
$ρ_{i}$	Energy consumption rate of sensor $v_{i}$
$r e_{i}$	Residual energy of sensor $v_{i}$
$p_{i}$	The number of packets transmitted by sensor $v_{i}$
$r l_{i}$	Residual lifetime of sensor $v_{i}$
B	Energy capacity of MC
$μ$	Charging power of MC
v	Moving speed of MC
$ξ$	Moving energy consumption rate of MC

Table 2. Simulation parameters.

Parameter	Value
Square area	1000 m $\times 1000$ m
Number of sensors N	100–500
Battery capacity of sensors b	10.8 kJ
Data rate of sensors $d r_{i}$	[1 kbps, 10 kbps]
Energy capacity of MC B	2000 kJ
Moving speed of MC v	5 m/s
Moving cost of MC $ξ$	600 J/m
Charging power of MC $μ$	5 W
Monitoring period $T_{M}$	1 year
Reward function weight $α$	1000
Reward function weight $β$	1.5
Discount factor $γ$	0.9
Number of possible charging sizes k	5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, W.; Li, Y.; Shao, T.; Lin, F. An On-Demand Partial Charging Algorithm without Explicit Charging Request for WRSNs. Electronics 2023, 12, 4343. https://doi.org/10.3390/electronics12204343

AMA Style

Gao W, Li Y, Shao T, Lin F. An On-Demand Partial Charging Algorithm without Explicit Charging Request for WRSNs. Electronics. 2023; 12(20):4343. https://doi.org/10.3390/electronics12204343

Chicago/Turabian Style

Gao, Weixin, Yuxiang Li, Tianyi Shao, and Feng Lin. 2023. "An On-Demand Partial Charging Algorithm without Explicit Charging Request for WRSNs" Electronics 12, no. 20: 4343. https://doi.org/10.3390/electronics12204343

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An On-Demand Partial Charging Algorithm without Explicit Charging Request for WRSNs

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. System Model

3.2. Problem Definition

4. The Proposed VRPD-RL-PART Algorithm

4.1. Problem Modeling

4.2. The Learning Algorithm

5. Performance Evaluation

5.1. Simulation Environment

5.2. Baseline Setup

5.3. Performance Comparisons

6. Discussion and Future Work

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI