Resource-Aware Federated Hybrid Profiling for Edge Node Selection in Federated Patient Similarity Network

Navaz, Alramzana Nujum; Kassabi, Hadeel T. El; Serhani, Mohamed Adel; Barka, Ezedin S.

doi:10.3390/app132413114

Open AccessArticle

Resource-Aware Federated Hybrid Profiling for Edge Node Selection in Federated Patient Similarity Network

¹

Department of Computer Science and Software Engineering, College of Information Technology, UAE University, Al Ain P.O. Box 15551, United Arab Emirates

²

Faculty of Applied Sciences & Technology, Humber College Institute of Technology & Advanced Learning, Toronto, ON M9W 5L7, Canada

³

College of Computing and Informatics, Sharjah University, Sharjah P.O. Box 27272, United Arab Emirates

⁴

Department of Information Systems and Security, College of Information Technology, UAE University, Al Ain P.O. Box 15551, United Arab Emirates

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(24), 13114; https://doi.org/10.3390/app132413114

Submission received: 6 October 2023 / Revised: 28 November 2023 / Accepted: 4 December 2023 / Published: 8 December 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The widespread adoption of edge computing for resource-constrained devices presents challenges in computational straggler issues, primarily due to the heterogeneity of edge node resources. This research addresses these issues by introducing a novel resource-aware federated hybrid profiling approach. This approach involves classifying edge node resources with relevant performance metrics and leveraging their capabilities to optimize performance and improve Quality of Service (QoS), particularly in real-time eHealth applications. Such paradigms include Federated Patient Similarity Network (FPSN) models that distribute processing at each edge node and fuse the built PSN matrices in the cloud, presenting a unique challenge in terms of optimizing training and inference times, while ensuring efficient and timely updates at the edge nodes. To address this concern, we propose a resource-aware federated hybrid profiling approach that measures the available static and dynamic resources of the edge nodes. By selecting nodes with the appropriate resources, we aim to optimize the FPSN to ensure the highest possible Quality of Service (QoS) for its users. We conducted experiments using edge performance metrics, i.e., accuracy, training convergence, memory and disk usage, execution time, and network statistics. These experiments uniquely demonstrate our work’s contribution to optimizing resource allocation and enhancing the performance of eHealth applications in real-time contexts using edge computing.

Keywords:

federated learning; federated resource profiling; deep learning; edge computing; workload profile; PSN; FPSN; eHealth

1. Introduction

Healthcare applications and devices collect vast amounts of patient data from sources such as sensors and medical equipment. These data have grown exponentially in recent years, making it difficult for providers to efficiently manage the data and associated resources. Edge computing is essential for utilizing automation and machine learning technologies to process and store massive amounts of data at the network’s edge, close to the point of collection. This will allow doctors to take immediate actions to help patients avert medical emergencies. The healthcare sector may become more streamlined and efficient, thanks to edge computing.

The concept of Federated Learning (FL) [1,2,3] has recently gained popularity in the healthcare industry as a means of solving the issues of data privacy and security through the cooperative training of algorithms that does not require the sharing of individual data sets, while it can be challenging to obtain large amounts of medical data, FL helps solve this problem by facilitating learning through collaboration rather than data centralization. Without transferring patient data outside of their current institutions’ firewalls, FL enables interactive insight-gathering, for example in the form of a consensus model. Instead, ML is performed locally at each participating institution, and only the characteristics of the models themselves (such as parameters and gradients) are shared. Researchers have found that FL-trained models outperform those that have access to only data from a single institution or node.

When the data defining the optimizer is dispersed unevenly among a very large number of nodes, ML presents another important context for distributed optimization. The goal was to minimize the number of communication rounds while maintaining highly efficient communication in order to train a high-quality centralized model known as Federated Optimization [4], which can provide on-device intelligence. With the increasing edge computing challenges of processing, storage, and network I/O, federated resource profiling is a promising approach that can assess local computation capacities and enhance the resource utilization capabilities of a distributed computing environment.

FL supported by resource profiling will have a minimal overhead because it will only evaluate the measurements and variance of the edge node’s resources. Quality of Service (QoS) and Quality of Experience (QoE) are key metrics in evaluating telecommunication services. QoS primarily concerns the technical aspects and performance metrics of a service, such as latency and bandwidth, while QoE focuses on the subjective user experience, encompassing aspects like satisfaction and usability. Together, these metrics provide a comprehensive understanding of service effectiveness from both technical and user-centric perspectives. Given that health tasks require a significant amount of computing power, the system must ensure strict QoS and QoE requirements. Moreover, each edge node can only perform computations up to its maximum capacity. A data-driven federation offers a trifecta of benefits, including the ability to optimize the consumption of data resources, reduce transportation costs, and enforce data privacy.

The proposed Federated Resource Profiling (FRP) model is intended to be used in the context of Patient Similarity Networks (PSNs) [5,6,7,8] and Federated PSN (FPSNs) [9], a relatively new network-based approach in precision medicine in which patients are grouped or categorized according to their similarities in a variety of characteristics. Similar instances may be grouped together to reveal common patterns in the progression of illnesses [10]. FPSN models distribute processing across federated nodes and then fuse the PSN matrices in the cloud, highlighting the need for a balance between computational speed and accuracy. This research provides valuable insights and practical solutions to address the challenges associated with resource limitations and network latency, contributing to the advancement of edge computing in the eHealth domain.

The rest of the paper is organized as follows. The next section surveys existing work on resource-aware FL and FL-inspired resource profiling. Section 3 provides a classification of the edge resources while Section 4 introduces the proposed resource-aware profiling model with a detailed formulation of the edge workload profile and node selection. Section 5 discusses the challenges encountered in the implementation of the proposed model and presents feasible solutions. In Section 6, the model is evaluated using the fetal health monitoring dataset distributed over various edge nodes. Section 7, ‘Future Directions’, elaborates on ongoing efforts to enhance FL systems, focusing on various advancements. Finally, Section 8 concludes the paper by summarizing the key findings and contributions of the study.

2. Related Work

Although some studies highlight resource-aware FL, very few have explored FL-enabled resource profiling that aids in node selection for optimizing the overall performance of the federated learning context. The following subsections review existing works that considered the resource characteristics of FL, FL-inspired resource profiling, and FL-driven client selection.

(A): Resource-aware Federated Learning
Recently, several research works related to FL have focused on developing algorithms for resource optimization based on one or more criteria, including cost, time, energy consumption, CPU power, and memory, using different algorithms. In [11], the authors developed a deep reinforcement learning algorithm using an optimization function for minimizing the cost based on training time and the consumption of energy. The objective was to increase the speed of training while conserving the energy consumption of mobile devices by controlling the CPU cycle frequency. Furthermore, the authors of [12] proposed an optimization framework based on reinforcement Q-Learning. Their goal was to minimize the communication rounds and maximize the accuracy. A hierarchical game framework was proposed to allow the dynamic edge association and resource allocation in self-organizing Hierarchical FL networks [13]. The authors of [14] introduced the q-Fair optimization objective to emphasize a fair distribution among edges in FL according to performance which is based on the model’s accuracy. To minimize the latency of FL training, the authors of [15] proposed a Multi-Armed Bandit algorithm based on training latency, availability, and fairness constraints.
A notable contribution in resource allocation and optimization in dynamically evolving network environments is proposed by Huang et al. [16]. Their research investigates the complexities of resource allocation in FL systems assisted by intelligent 104 reflecting surfaces (IRS), particularly under the constraints of imperfect channel state information (CSI). By employing advanced optimization techniques like semi-definite relaxation (SDR) and the constrained concave convex procedure (CCCP), they present a method to significantly reduce training time in these systems ultimately reducing the resource allocation challenges.
A framework proposed in [17] describes a resource and data-aware FL approach that characterized the agents/resources according to certain criteria. Initially, it uses the local data size to classify the resources. Other criteria can also be used for resource selection to more accurately classify the resources used in a priority-based decision-making model such as Multiple-Criteria Decision Making Model (MCDM). Experiments showed improvements in the accuracy with less training time over the standard FL baseline. The authors of [18] studied the effects of resource and dataset heterogeneity on the training time of FL deployed on AWS EC2. Their experimentation implied a severe impact of resource heterogeneity, in terms of response time and data quality, and the training time of FL.
(B): Federated Learning inspired resource profiling
Few resource-aware FL frameworks have adopted resource profiling to improve the performance of the FL model. For example, ElFISH is a resource-aware FL framework that adopted resource profiling according to the edge computation capabilities to attain synchronized collaboration [19]. Experimentation showed an improved training accuracy of up to 4% and double the speed over conventional FL schemes. The authors of [20] proposed a Tier-based FL system that classified the clients adaptively into groups according to their performance, then selected only the nodes within the same group to avoid straggler problems and delays caused by the slowest devices resulting from resource heterogeneity. The results from their experiments showed that the proposed approach improved the training speed while keeping the accuracy the same or slightly increased. Furthermore, the authors of [21] adopted a low-complexity profile-based resource-aware allocation strategy in FL called Dispersed Federated Learning (DFL). They implemented an optimization problem for resource allocation and device association by defining the preference and resource profiles. Preference profiles were used to allocate resource blocks to devices having a lower cost. Alternatively, the authors of [22] proposed a FL time minimization problem to control the data size used in each device using the makespan minimization problem and assigned identical tasks to heterogeneous resources. They adopted an optimal polynomial-time scheduling algorithm called OLAR.
(C): Federated Learning-driven edge node selection
In FL, the number of clients may be large, but the bandwidth available for model distribution and re-uploading is limited, making it advisable to only involve only a few of the edge nodes in the training process at any given time. This is a challenge that must be overcome to ensure that the model training process runs smoothly and without delay. With regards to the training efficiency, the quality of the final model, and its fairness, a client selection policy is crucial to a FL process. In [23], the authors proposed a model wherein the Lyapunov optimization problem of selecting clients with the highest degree of fairness was handled. Although FL has the potential to safeguard privacy in distributed machine learning, its limited applicability is due to inefficiencies in its communication [24]. With hierarchical FL (HFL), workers who own the data first send their updated model parameters to the edge servers for intermediate aggregation before sending everything to the central server for global aggregation. For the effective HFL deployment, it is necessary to address the issues of edge association and resource allocation for non-cooperative parties such as edge servers and model owners. Using the edge server’s predicted bandwidth allocation management approach in self-organizing HFL networks, a hierarchical game framework [13] is proposed to analyze the dynamics of edge association and resource allocation, while collecting more local data from the clients enables more precise global model building; the resulting heavy data arrivals at the edge could be counterproductive to queue stability. The authors of [25] proposed an algorithm that varied the number of clients chosen based on their current resource usage for transmission in order to maximize the time-averaged FL accuracy while maintaining the queue’s stability.
In a traditional wireless network, an FL setup involves clients sharing a wireless connection to a central server where the federated models are trained. Due to the fact that both radio energy and the client’s energy are limited [26], the learning performance is dependent on how the clients are selected and how the bandwidth is distributed among the selected clients in each learning round. To formulate a stochastic optimization problem for joint client selection and bandwidth allocation under long-term client energy constraints, the article [27] developed a new algorithm that used only the currently available wireless channel information but still guaranteed long-term performance.
(D): QoS and QoE in Federated Learning Systems
QoS and QoE are fundamental to the effectiveness of FL systems, especially in edge computing environments and is particularly crucial in sectors like healthcare [28], where the precision and speed of data processing are essential. QoS, as defined by ITU-T E.800, encompasses critical service characteristics like latency and bandwidth that directly impact the performance of FL systems [29]. QoE, on the other hand, focuses on the user’s perception and satisfaction with the service, as outlined by ITU-T P.10 and Qualinet [30].
Recent studies emphasize the importance of QoS and QoE in heterogeneous networks. Poryazov et al. [31] address QoS estimation challenges, proposing a normalization approach for telecommunication systems. Huang et al. [16] explore resource allocation in FL systems, contributing to the understanding of QoS under imperfect channel conditions. These insights are vital for optimizing both QoS and QoE in FL systems.
Ensuring high QoS and QoE in FL systems, particularly in data-intensive applications like healthcare, involves addressing challenges such as resource heterogeneity and network variability. Adhering to standards like ITU-T Supp. 9 of the E.800 Series [30] ensures consistency and interoperability in different computing scenarios.
Optimization of QoS and QoE in FL requires adaptive strategies that consider the dynamic nature of edge computing. Simulation tools like iFogSim and FogNetSim++ are instrumental in modeling network characteristics and resource utilization, facilitating the development of strategies that enhance both QoS and QoE [32].
The integration of QoS and QoE considerations is crucial for the development of efficient, reliable, and user-centric FL systems in edge computing. By focusing on these quality metrics and leveraging advanced simulation tools, FL systems can be optimized to meet diverse user needs and network conditions, enhancing the overall effectiveness of edge computing services.

There is a gap in the literature concerning the significance of measuring performance metrics in optimizing edge performance, as well as the appropriate classification of edge resources and the relevant metrics.To the best of our knowledge, previous works on FL resource profiling have not combined both edge static and dynamic resource quality profile attributes with edge reputation information to ensure optimal edge node selection. With this in mind, we proposed a hybrid resource-aware FPSN solution that integrated multi-criteria resource quality evaluations and edge’s reputation to decrease the response times and improve the prediction performance.

3. Edge Resources Classification

Efficient optimization of the edge resources can enable real-time analytics and business intelligence, which can be used to gain a competitive advantage. One of the first steps in creating adaptive resource-management systems is determining the various application-specific performance characteristics and metrics of edge-based resources [33,34]. It is challenging to gather different performance metrics in a timely and accurate manner due to the abundance of deployed hardware configurations with various architectures and versions [35].

Resource profiling is the first step to determine how resources are currently being used for the best management of the hardware, and it is necessary to identify efficient methods for controlling hardware and obtaining the best performance out of it. Hence, prior to resource profiling, a selection of the edge resources and their performance characteristics are identified, as depicted in Figure 1 and Table 1, containing the respective description and metrics.The psutil (process and system utilities) Python library [36] provides access to system and process information on any supported platform such as CPU, memory, disks, network, sensors, etc. The categorization is largely based on such utility functions that can monitor edge nodes to help create the resource profile, control access to system resources, and manage processes in progress.

In addition to the quantitative performance qualities indicated in Table 1, the following intangible elements can also improve edge performance.

Agility: This involves preparing for unknown workloads and applications, reconfiguring component systems, and having edge management tools that adapt. It relates to the velocity component of big data and the capacity to process continuously flowing streams in a timely manner [37].

Total Cost of Ownership: Edge providers must weigh the costs and benefits of using industry-standard hardware, storage, support, and other services, as the redundant infrastructures required to support cloud computing have a significant impact on the TCO [38]. Detaching storage components from other parts of the system may help with scalability.

Serviceability: Edge locations can be difficult to service due to distance, cost, and the volume of sites to service. The serviceability of a network is defined as its capacity to meet the needs of connected end devices (e.g., throughput, delay, and packet loss) [39]. In the 5G era, high serviceability at the edge of the network is seen as a crucial foundational criterion for a successful Internet of Things paradigm. A software-defined implementation with high availability and durability of hardware and storage media is recommended.

Security and Privacy: Content perception, real-time processing, and parallel processing are some of the ways in which edge computing differs from more traditional forms of computing, and this difference has led to new challenges in the area of data security and privacy protection [40]. There is a large attack surface on edge resources, so security countermeasures need to be effective and efficient. DoS attacks, penetration techniques, and physical break-ins are all part of this category. Since manual patches can have an impact on performance, automated remote patching is an absolute necessity from a security perspective.

4. Resource-Aware Federated Profiling Model

In this section, an overview of our resource-aware federated profiling model, the formulation of the edge workload profiling and an explanatory example of the edge workload profile is provided. In order to make the most effective use of the distributed computing resources at the edges, the resource-aware federated profiling framework was established.

(A): Model Overview
The Federated Workload Profiling model proposed in this section addresses resource planning. Planning of the edge workload is formulated as a resource optimization problem that eliminates the nodes that do not meet the computing demands.
The model as shown in Figure 2, is a hybrid resource profiling model that combines both static and dynamic profiling. Static profiling mainly depends on the physical characteristics of the edge resource’s, such as the CPU’s computation power, memory size, and transmission power. On the other hand, dynamic profiling relies on the collected real-time available resources and the performance metrics at the edge. The real-time available resources include but are not limited to CPU utilization, memory usage, and network delays. However, the edge’s performance metrics are based on data training performance, such as accuracy and convergence time of the model.
Initially, a baseline resource profile is created based on the static profiling and is collected from all edges. An Edge Workload Profile (EWP) is created based on the configuration file available at the edge and a real-time measurement of the parameters including the task, the incoming data, the outgoing data, the infrastructure (CPU/GPU, storage), the mobility of the edge device, the round-trip latency based on the current location and the reputation of the edge. The Edge Workload Profiling Module (EWPM), which is responsible for continuously updating the EWP, receives information about the dynamic characteristics of the edge resources from the Node Resource Monitoring Agent (NRMA) at the edge node. Each client creates their own edge workload profile, which is then sent to the federation server, where the Federated Resource Profile is built, in accordance with the federation’s rules. The EWP’s resource parameters and the edge node’s reputation that sustain prior edge performance, both are taken into consideration by the federation’s rules.
The Federated Workload Profiling (FWP) module aggregates EWP’s from all nodes and provides inputs to the Node Selection Module (NSM) that selects the edges depending on the thresholds of the resource parameters that have been stated in advance.The thresholds for CPU, memory, network, sensors, and storage metrics for the NSM depends on the system’s requirements and resources, but a general guideline is provided in Table 2. It is important to note that these thresholds are not set rigidly and may need to be adjusted based on the specific requirements of the system and the workload being processed. The effect of different thresholds may vary depending on the number of edges and their individual resource capabilities. For example, if the threshold for network usage is set too low, it may lead to slower communication and longer training times, particularly if there are a large number of edges. On the other hand, if the threshold is set too high, it may lead to network congestion and communication failures.
Additionally, the thresholds can be set dynamically based on the workload and resource utilization patterns over time. Resource profiling will initially rely on expert-set thresholds, using their deep understanding of system capabilities and expected workloads. Over time, as the system accumulates and analyzes historical data on resource usage, these thresholds can be refined to more accurately reflect the real-time demands and efficiency requirements. This evolving process allows the system to dynamically adapt its resource allocation, enhancing performance and resource management based on actual usage patterns. For instance, in a cloud-based service use case, the system initially operates with expert-set CPU and network bandwidth thresholds. However, it observes that during peak business hours, CPU usage consistently exceeds these thresholds, leading to performance bottlenecks. By dynamically adjusting the CPU threshold based on this historical data, the system can allocate more processing power during these peak hours, thereby reducing latency and enhancing user experience. This practical utility demonstrates how adaptive resource profiling can respond to fluctuating workload demands, ensuring optimal resource utilization and system performance.
The client edge is discarded if the edge resources are limited. For the duration of the data streaming process, iterations are performed, and the profile is incrementally updated. To minimize the overhead, only the profile updates are transmitted following the initial run and at every federated aggregation event. As time is often a deciding factor in PSN deployments, the objective of this process was to select the nodes with the fastest response times by facilitating the faster convergence of the ML algorithms with resource optimizations.
(B): Workload profiling model formulation
The EWP is executed at the edge, to determine if the edge satisfies the federation’s profiling requirements. The static EWP is based on $C_{e}$ , which is the maximum computation capacity of the edge node e; $β_{e}$ which is the computation cost of profiling; $μ_{e}$ which is the effective memory at the edge; and $S_{e}$ , which is the data size available at the edge. However, the dynamic EWP relies on resource usage monitoring process that is responsible for collecting the edge’s real-time status in terms of resource utilization and performance. This process collects the CPU usage $C_{u}$ , the effective spare memory $μ_{u}$ , and the battery level $E_{e}$ . Furthermore, this process will initiate a training model on the existing data and collect the accuracy results to measure the performance $P_{e}$ , as well as measure the training time $T_{t r a i n}$ . The results are sent back to the server for profiling.
The server, specifically the FWP module collects the aforementioned parameters as well as measuring the transmission cost of the EWP message from each node e, denoted as $E W P_{e}$ . Let $T_{d e l a y}$ be the tolerable delay, and $E W P_{T e}$ be the transmission cost of $E W P_{e}$ . The model considers the reputation of the edge as part of profiling evaluation. The reputation of an edge e is determined by its mobility status and its historic performance in terms of the number of previous successful model training requests using the following formula:

$R e p u t a t i o n_{e} = \sum \frac{(N o o f s u c c e s s f u l T r a i n i n g o f e)}{(T o t a l N o o f R e q u e s t s t o e)} - \{\begin{matrix} 1, & if Mobile Node \\ 0, & if Stationary Node \end{matrix}$

(1)

The Data Flow Rate $D F R_{e}$ is expressed as a function of the effective distance between the edge node i and the centralized cloud to the average packet forwarding rate. The FWP module at the server will generate an aggregated profile for each edge $E W P_{e}$ using a utility function, which calculates the weighted average of all resources’ normalized quality attributes, including the static and dynamic characteristics as follows:

$E W P_{e} = \{\begin{matrix} U (C_{e, β, e}, S_{e}, μ_{e}, D F R_{e}, R e p u t a t i o n_{e}), S t a t i c_P r o f i l e \\ U (C_{u}, E_{e}, P_{e}, μ_{u}, T_{t r a i n}, D F R_{e}, R e p u t a t i o n_{e}), D y n a m i c_P r o f i l e \end{matrix}$

(2)

$s . t . E W P_{T e} < T_{d e l a y}$

(3)

This utility function is used for MCDM modeling problem, where the edge quality profile attributes are normalized to be in the same interval scale and prioritized according to the weights given by domain knowledge experts depending on the application type and its requirements.
To further explain the problem, the MCDM model is formalized as follows: Let A be a set of edge quality profile attributes, where
$A = {C_{e}, β_{e}, S_{e}, μ_{e}, D F R_{e}, R e p u t a t i o n_{e}}$ in the case of static profiling, and
$A = {C_{u}, E_{e}, P_{e}, μ_{u}, T_{t r a i n}, D F R_{e}, R e p u t a t i o n_{e}}$ when dynamic profiling is considered.
Let n be the number of quality attributes used for resource profiling, and $A^{'} = {a_{1}^{'}, a_{2}^{'}, \dots, a_{n}^{'}}$ is the set of normalized values measured for each of the quality attributes included in A. Let $W = {w_{1}, w_{2}, \dots, w_{n}}$ be the set of weights given to each of the corresponding quality attribute in A to afford flexible prioritization for each attribute according to the applications requirements and preferences.

$E W P_{e} = \sum_{i = 1}^{n} w_{i} * a_{i}^{^{'}}, where \sum_{i = 1}^{n} w_{i} = 1$

(4)

If the status of the edge node given by $E W P_{e}$ satisfies the federated profiling requirement and the time constraint is satisfied, the edge is selected. The $D e l a y_{e}$ factor should be kept to a minimum for time-sensitive applications.
(C): Example of an Edge Workload Profile (EWP)
A sample of the Edge Workload Profile which captures the static and dynamic characteristics of an edge node is presented in Figure 3.
Dynamic parameters, such as CPU utilization, network statistics, and battery information are collected by the EWPM’s resource monitoring agent; static parameters are kept up to date via the nodes’ resource configuration files; and both sets of data are used to optimize the workloads. For the purpose of determining a node’s reputation, the EWP keeps track of the total number of training requests as well as the number of requests that have been successfully completed. The FWP module at the federation server further parses the EWP received from all the edges.

5. Model Challenges and Feasible Solutions

(A)

Security and Privacy Challenges

The proposed model is concerned with maximizing resource usage and selecting the optimal edge nodes, but it also presents possible privacy concerns related to sharing sensitive information about the resources of the edge nodes. To address these concerns, the proposed model must implement robust security measures to protect the sensitive resource information from unauthorized access or misuse. For example, the model could use encryption techniques to protect the data during transmission and storage. Access controls can also be implemented to restrict access to the resource’s information to authorized parties only. Additionally, the model must be transparent about its data handling and sharing practices, providing clear and concise privacy policies and obtaining informed consent from users before collecting and sharing their data.

The FWP model’s design incorporates a reputation-based system that assigns a reputation score to each node based on its past behavior. Nodes with a low reputation score are then excluded from the federated learning model.

Another approach is to use techniques such as differential privacy or, more specifically FL enabled personalized differential privacy [41] to protect the privacy of the data, even in the presence of malicious nodes. Differential privacy adds noise to the data before sharing it with the nodes, which makes it difficult for malicious nodes to identify the data of a specific node. Other approaches include blockchain-based [42,43] privacy information security sharing schemes.

Overall, it is important to balance the benefits of the proposed model’s resource optimization with the potential privacy risks, ensuring that adequate measures are in place to protect sensitive data and respect the users’ privacy rights. This not only applies to FL, but also to any distributed or collaborative approaches that require data sharing and processing at the edge nodes.

(B)

Communication Overhead

If the edge nodes continuously update their resource information to an Edge Workflow Platform (EWP), it may incur additional communication costs. However, this can be addressed by optimizing the communication protocol and minimizing the frequency of updates. For example, the edge nodes can send updates only when there is a significant change in their resource availability or utilization.

There is a risk that an edge node may provide incorrect or misleading information about its resources. This can happen due to various reasons such as hardware or software failures, malicious attacks, or incorrect configurations.

There are several approaches that can be taken in such situations but not limited to:

Resource verification: The EWP should verify and validate the resources before accepting the resource information from an edge node.

Data encryption: To prevent malicious attacks on the data, edge encrypts its data before sending them to the server. The server can then decrypt the data and verify its authenticity. It is important to note that in the current context, the encryption is intended for resource profile data, not health datasets, and is focused on protecting personally identifiable information or other sensitive details related to the resources.

Model aggregation: The server can use a weighted averaging technique to aggregate the updates from all the participating edges. This method assigns more weight to information received from nodes with a better reputation and less weight to updates from edges that have previously sent erroneous data.

Node selection: The server will dynamically select edge nodes based on their performance history. Accordingly, nodes with a history of sending unreliable data can be excluded from the training process.

(C)

Computational Overhead

Given that we are only processing the quality profile and not the entire dataset, the computational overhead remains constant regardless of the size of the data. However, profile federation overhead grows in proportion to the number of nodes.

Quality profile federation is the process of aggregating multiple quality profiles into a single federated profile. The EWP contains a set of resource attributes, their tolerance values, and rules for resource optimization. The computational overhead of the FWP depends on various factors, such as the number of profiles being aggregated, the size of the profiles, and the complexity of the rules.

The computational overhead can be formulated as:

C o m p u t a t i o n a l O v e r h e a d = N u m b e r o f N o d e s * (T O + P O + E O + R O + F O)

(5)

where the Transfer Overhead

(T O)

is the computational cost of transferring the profile data between the nodes and the server, including serialization, deserialization, and network communication. It is directly proportional to the size of the profile. The Parsing Overhead

(P O)

is the computational cost of parsing the input data, including XML parsing and other data transformations. The Extraction Overhead

(E O)

is the computational cost of extracting the relevant information from the input data, such as the DQ attributes or other features. The Rules’ Firing overhead

(R O)

is the computational cost of applying the DQ rules to the extracted data, including any computations or comparisons required by the rules. The Federation Overhead

(F O)

is the computational cost of aggregating the results of the quality rules across multiple nodes, including any additional computations or transformations required to produce the final output.

Strategies that can help with reducing the overheads include:

For massive XML documents, use a parser that is both lightweight and fast.
Save time and effort by caching parsed data.
Use effective algorithms and data structures to optimize the processing and aggregation of the profiles.
Use multithreading for parallel computation.

Note that the specific values of each overhead component will depend on the size and complexity of the input data, the number of nodes involved, and the specific details of implementing the approach. Overall, the computational overhead of EWP aggregation can be significant, when the number of nodes is significantly high, but the benefits of having a single, comprehensive Federated Workload Profile (FWP) can outweigh the costs.

6. Experimental Evaluation

This section details the experiments devised to evaluate the proposed federated resource profiling model. In the proposed approach, quality profile federation plays a crucial role, and involves aggregating multiple quality profiles, which represent the resource attributes, tolerance values, and rules for resource optimization, into a single federated profile. This federated profile provides a comprehensive view of the system’s resources and serves as a basis for making informed decisions on resource allocation and workload management. By leveraging the federated profile and employing FWP techniques, the experiment’s objective was to evaluate the FWP in terms of memory, disk space, execution time, and other resource utilization parameters at the edge nodes like network statistics and the effects of ML algorithms on tree depth, and to demonstrate that it can aid with edge node selection and FPSN. The following subsections describe the experimentation including the environment setup, the dataset used, the experimental design and scenarios. Finally, the attained results are discussed.

6.1. Dataset

The data consist of 2126 Cardiotocogram instances and 23 attributes [44,45]. These are continuous measurements of the fetal heart rate using an ultrasound transducer on the mother’s abdomen and categorized by expert obstetricians. The Instantaneous fetal heart rate (FHR) and the simultaneously communicated uterine contraction signals are the data analysis parameters. The classification results were based on fetal state labels (N = normal; S = suspect; P = pathologic).

6.2. Experiment Setup

The entire project was carried out in the open-source scientific environment Spyder IDE [46], which is a Python-based native application. The entire Fetal Health dataset is randomly split and distributed into six datasets, corresponding to the six edge nodes, for the experimental configuration. The approach divides the dataset into subsets in the current context solely for the purpose of assigning each subset to an edge node and evaluating the associated resource profiles, with no FL involved.

6.3. Scenarios

To evaluate the proposed FWP model, four experimental scenarios were implemented. By conducting these experiments, we aimed to gain insights into the capabilities and limitations of the FWP model and to validate resource profiling optimization under different scenarios.

6.3.1. Scenario 1: Federated Workload Profiling on Memory Usage

Best performing ML models with respect to memory usage.
Initially, the effects of memory usage before and after FWP resource optimization techniques were examined using five different ML algorithms as shown in Figure 4.
After implementing the FWP resource optimization algorithms, it was found through experimentation that memory consumption dropped significantly. Additionally, it can be deduced that Random Forest Tree, followed by GBDT and Decision Tree, were the best-performing ML models. The memory usage of LightGBM was the highest of all the models evaluated. In conclusion, the experiments revealed that Random Forest Tree was the top-performing ML model in terms of memory consumption, and thus the Random Forest model was employed for the rest of the experiments.
Memory usage prior to and after FWP at different edge nodes.
FWP optimization strategies were implemented across six diverse nodes as seen in Figure 5, revealing a significant drop in the memory requirements in comparison with conventional approaches. Although all the nodes benefited from FWP, Nodes 5, 6, and 2 saw the greatest reductions in the amount of memory they used, respectively.

6.3.2. Scenario 2: Federated Workload Profiling on Various Resource Parameters at the Node Level

Disk space usage prior to and after FWP at different edge nodes.
The FWP experiments’ impact on disk space consumption was comparable with that of other previous experiments as evident from Figure 6. All of the nodes showed lower disk space usage after the FWP experiments, even though there was no large disparity in the usage of disk space.
Network I/O statistics prior to and after FWP
Each edge node’s network I/O data were recorded, which included the characteristics bytes sent, bytes received, number of packets transmitted, number of packets received, total number of incoming dropped packets, and the total number of outbound lost packets. This I/O data analysis revealed crucial insights into the system’s overall performance and helped identify potential bottlenecks and issues. After applying our proposed optimization strategies, which included resource-aware data reduction methods such as compression, protocol optimization, and caching, we were able to visually inspect the bytes sent and received and confirmed a decrease after implementing FWP, as shown in Figure 7.

6.3.3. Scenario 3: The Cumulative Impact of Federated Workload Profiling across Expanding Node Networks

Extending the Scenario 2 experiments, an assessment was conducted to evaluate the cumulative impacts of optimization strategies on disk space, memory, and network performance as the number of nodes in the federation increases. The size of a federation can vary widely, ranging from a few nodes to hundreds or even thousands, depending on the specific network or system architecture. The number of nodes in a federation often impacts various aspects of network management, performance, and resource allocation.

FWP impact on cumulative memory with increasing nodes.
In this experiment, we explore the cumulative memory usage across ten nodes before and after implementing FWP and its impact as the number of nodes increases.
Figure 8 illustrates the cumulative memory usage across ten nodes both before and after the implementation of FWP. The blue line represents the memory consumption before FWP, and it shows a steady increase as additional nodes are incorporated into the system. Conversely, the orange line, depicting memory usage after FWP, displays a similar upward trend but with consistently lower values compared to the pre-FWP scenario. This suggests that FWP effectively optimizes memory resources, resulting in reduced memory utilization as the number of nodes in the federation increases. The graph highlights the positive impact of FWP on memory management, showcasing its ability to efficiently scale memory consumption with the growing number of nodes, potentially leading to enhanced system performance and resource allocation.
FWP effect on cumulative disk space with node expansion.
This experiment investigates the effects of FWP on cumulative disk space usage across an expanding node federation.
Figure 9 provides a visual representation of cumulative disk space usage across ten nodes, both before and after the implementation of FWP. The cumulative disk space usage is presented in gigabytes (GB) and is measured as the sum of disk space occupied by each node in the federation. The blue line represents the cumulative disk space consumption before FWP, showcasing the gradual increase in storage usage as more nodes are incorporated into the network. Each point on the line corresponds to a specific node, illustrating the progressive accumulation of data storage requirements across the network. In contrast, the orange line represents cumulative disk space usage after FWP activation. There is a similar growth trend as more nodes join the network. The graph utilizes a dual-axis approach, with separate y-axes on the left and right sides, to present cumulative disk space usage before and after the implementation of FWP. This separation prevents data from overlapping with the other, ensuring that both trends can be easily examined and compared.
The graph suggests that the difference in cumulative disk space between before and after the implementation of FWP is generally negligible. In practical terms, this indicates that FWP has not significantly impacted the cumulative disk space usage across the ten nodes. Overall, the negligible difference in cumulative disk space suggests that FWP has moderately reduced but not substantially increased or decreased the total storage requirements of the system, at least within the observed scope of these ten nodes.
FWP’s influence on network I/O with increasing federation nodes.
In this experiment, we investigate the effects of FWP on network I/O statistics as the number of nodes within the federation expands.
Figure 10 offers a comprehensive perspective on network I/O (Input/Output) statistics for ten nodes, featuring four key components. The blue bars denote cumulative bytes sent before the introduction of Federated Workload Profiling (FWP), while the green bars represent cumulative bytes received before FWP. Similarly, the orange bars depict cumulative bytes sent after FWP, and the red bars showcase cumulative bytes received after FWP. Additionally, the graph incorporates purple bars, which represent the cumulative difference in bytes sent, and cyan bars that represent the cumulative difference in bytes received. The two y-axes in the graph serve distinct purposes: the primary y-axis represents cumulative network traffic volumes, while the secondary y-axis illustrates the cumulative differences in network activity before and after the implementation of Federated Workload Profiling (FWP). The cumulative difference bars on the secondary y-axis appear relatively higher, mainly because they are scaled using a factor to emphasize even subtle variations in network behavior. This comprehensive visualization aids in understanding the impact of FWP on network data flow across multiple nodes.
The graph demonstrates a notable network traffic pattern. Initially, there is a reduction in data bytes sent and received at Node 1 after implementing FWP. However, as more nodes join the federation, there is a consistent increase in network activity. Notably, the cumulative difference bars (purple and cyan) remain positive, highlighting FWP’s ability to effectively manage and optimize data traffic as the network scales. This trend suggests significant network traffic savings when FWP is employed with a larger number of nodes, enhancing network efficiency and resource utilization.

6.3.4. Scenario 4: Federated Resource Profiling on Tree Depth

Faster convergence of ML algorithms.
When compared with the conventional approaches, the execution time of each of the five different machine learning algorithms evaluated with the proposed FWP was found to be significantly reduced, as shown in Figure 11. According to the results, Decision Tree and Random Forest performed better than other ML algorithms and converged more quickly. Even though the worst-performing model with the longest execution time was GBDT, our proposed model, FWP, cut the execution time to almost half.
Memory utilization
Algorithms for machine learning, such as Random Forest, have memory requirements that vary with the number and size of the trees in the algorithm. When dealing with big or complex datasets, the tree might be quite deep and include thousands of nodes. In such circumstances, the memory usage will increase exponentially. A decrease in the depth of the tree will lead to a reduction in the amount of memory that is used.
In Figure 12, the graph provides us with a visual representation of how the Random Forest Tree Depth affected execution time when FWP is utilized. It is evident that this was the reason for the decreased utilization of resources following FDP implementation.

6.3.5. Scenario 5: Federated Workload Profiling Aiding in Edge Node Selection and FPSN

Depending on the application, the edge nodes can be mobile or static in edge computing. Mobility reduces the efficiency with which edges can exploit their resources, which can result in the FWP deciding on which edge nodes to be removed. To keep resource utilization and performance at their peak, the algorithm also takes into account the reputation of each node, keeping only the most trustworthy edge nodes even if they are mobile. The response time; metric scores related to CPU, memory, network, sensors, processes, and storage; reputation score; mobility; etc., all factor into the weighted average score calculated at the server and used as a selection criterion for the EWP nodes. If two nodes have the same weighted average score, the server prioritizes the node with the highest reputation score, followed by the node with the lowest response time, the lowest mobility, and the best CPU and memory availability and selects the best n edges. Ultimately, the weight given to each of the factors depended on the specific requirements of the system and the trade-offs between aspects such as performance, scalability, and cost. It is important to continuously evaluate and refine the approach based on real-world performance to ensure optimal results.

Because of its poor response time, limited mobility, excessive congestion, and low reputation, Edge Node 6 is removed after FWP is applied. Since time is of the essence in FPSN calculation, only the most resource-rich nodes were selected. To reduce the response time and to get a precise performance measurement while maintaining high accuracy, only trustworthy nodes were selected to use in our FPSN calculation by factoring in their reputation. It is evident that the performance measure accuracy has slightly improved after resource optimizations following FWP, which was applied at the nodes and used in FPSN calculations as shown in Figure 13.

6.4. Results and Discussion

In this study, we presented four different scenarios that utilized Federated Workload Profiling (FWP) to optimize the resource utilization of edge computing nodes.

In Scenario 1, we evaluated the memory usage of five different machine learning models before and after applying FWP optimization techniques. On an average, applying FWP led to a 47.8% reduction in memory usage across all machine learning algorithms. The algorithms that benefited the most from FWP in terms of memory usage reduction were Random Forest Tree and Decision Tree, with reductions of 73.9% and 69.3%, respectively. XGBoost and LightGBM had the smallest reductions in memory usage, at 53% and 43.5%, respectively. We found that Random Forest Tree was the best-performing model in terms of memory consumption and used it for the rest of the experiments.

In Scenario 2, we analyzed the impact of FWP on disk space usage and network I/O statistics. Our proposed optimization strategies resulted in slightly lower disk space usage and decreased network I/O after implementing FWP. Specifically, the total disk space used decreased by 336,304 bytes. With the network I/O statistics analysis, we can see that the on an average the number of bytes sent by a node decreased by 2,623,071 bytes (or about 1.4%), while the number of bytes received by a node decreased by 40,266 bytes (or about 0.04%).

In Scenario 3, the cumulative impact of FWP across expanding node networks was examined. As the number of nodes in the federation increased, FWP demonstrated its effectiveness in optimizing memory resources, resulting in a notable reduction in memory utilization of approximately 9%. Additionally, FWP showcased its ability to efficiently manage network traffic, leading to significant network traffic savings of around 27%. These findings underscore the scalability and positive impact of FWP in enhancing system performance and resource allocation as federated network nodes increase.

In Scenario 4, we focused on evaluating the impact of FWP on the convergence time and memory utilization of machine learning algorithms. Our results showed that the execution time of the evaluated algorithms was significantly reduced, with Decision Tree and Random Forest performing the best in terms of faster convergence. The execution time decreased after FWP was applied, with percentage decreases ranging from 19.2% to 62.3%. Specifically, the execution times for Random Forest, Decision Tree, GBDT, XGBoost, and LightGBM decreased by 19.2%, 33.3%, 53.9%, 62.3%, and 57.9%, respectively. For the Random Forest tree depth and training time before and after FWP, we found that the average tree depth decreased by 3.34, or about 27.5%, after FWP was applied which will in turn lead to a reduction in the amount of memory. Similarly, the average execution time decreased by 24.68 ms, or about 20.4%, after FWP was applied. These results suggest that FWP had a significant positive impact on both tree depth and training time for Random Forest across all nodes.

Finally, in Scenario 5, we utilized FWP to aid in edge node selection and FPSN accuracy calculation. We removed an underperforming edge node and only selected trustworthy nodes to use in our FPSN calculation. On an average, applying FWP led to a 6.8% increase in accuracy across all nodes. However, it is worth noting that the degree of improvement varied across different nodes, with Node 1 having the largest increase in accuracy (8%), and Node 4 having the smallest increase (4%).

Overall, the results of our experiments demonstrate the effectiveness of FWP in optimizing the resource utilization of edge computing nodes. The implementation of our proposed optimization strategies yielded substantial reductions in memory usage and modest improvements in disk space efficiency. Concurrently, these strategies enhanced network I/O performance and led to more efficient convergence times for machine learning algorithms. Furthermore, the FWP approach can be used to aid in edge node selection and improve the accuracy of FPSN.

7. Future Directions

As part of our ongoing efforts to enhance FL systems, we will focus on several key areas of development that align with the evolving challenges in the field. Our future research will concentrate on the following advancements:

Integrating Statistical Heterogeneity and Optimizing Edge Reputation in FL:

In our future work, we aim to incorporate statistical heterogeneity into our resource-aware federated profiling framework, drawing inspiration from methodologies in studies such as Oort [47] and PyramidFL [48]. This will enhance our model’s adaptability in environments with diverse data distributions across nodes. Simultaneously, we plan to optimize the edge reputation factor, which assesses the historical performance and contribution of nodes. This optimization will include integrating advanced utility profiling mechanisms that adapt to the changing performance and contributions of nodes over time, ensuring more effective and reliable node selection.

Addressing Computational Stragglers in Federated Learning:

We intend to tackle the challenge of computational stragglers, employing strategies such as AD-PSGD algorithm [49] and Prague [50], for efficient information exchange and robust asynchronous operations. AD-PSGD algorithm [49], aims to enhance the overall efficiency, particularly in dealing with nodes that exhibit slower or irregular computational performance. The Prague algorithm [50], a high-performance, heterogeneity-aware training method that enhances synchronization efficiency and reduces conflicts, using advanced techniques like Partial All-Reduce and static group scheduling make it an ideal model for optimizing federated learning systems and managing computational stragglers.

Expanding Experimentation with Diverse Datasets:

Our experiments are currently limited to a specific, homogeneous dataset, highlighting the need for future research to include more varied and heterogeneous data to comprehensively assess the proposed design’s effectiveness and scalability in diverse real-world scenarios. To validate and refine our approach, in the future we will employ a range of datasets in our experiments. This will enable us to assess the robustness and scalability of our design under different conditions and conduct comprehensive comparisons with state-of-the-art resource optimization strategies and systems.

Enhancing Fine-grained Client Selection:

We will also explore fine-grained client selection strategies to improve time-to-accuracy performance in FL systems. A key technique we plan to incorporate is client clustering based on data characteristics. This method involves grouping clients with similar data distributions or computational capabilities, allowing for more targeted and efficient model updates. This approach will focus on exploiting data and system heterogeneity within selected clients for more efficient utility profiling, leading to improved model training efficiency and effectiveness.

In addition to the areas mentioned, further research is critical to fully evaluate the efficacy and scalability of our system in real-world scenarios, particularly in addressing privacy and security concerns. As improving node collaboration and resource utilization remains a top priority, future developments will focus on enhancing the FWP to allow for resource offloading to resource-rich nodes. By doing so, we aim to maximize the efficiency of resource allocation across the network, ensuring optimal utilization of available resources while maintaining system integrity and security.

8. Conclusions

As a result of digital transformation, healthcare is becoming decentralized, and reaching out to the network’s edge will allow us to better capture data, meet the demands of patients, and foster innovation.The research highlights the importance of optimizing node collaboration and resource utilization, with a specific focus on resource profile federation and resource optimization evaluation. FWP combined with node reputation are two effective measures to ensure optimum edge node selection. In order to analyze and optimize the resources, and select appropriate edge nodes, edge workload profiling and Federated Workload Profiling are proposed to support the deployment of FPSN. Time is often a deciding factor in FPSN and PSN deployments; consequently, our proposed model was successful in selecting the nodes with the quickest response times by facilitating faster convergence of the ML algorithms through resource optimizations. The experimentation conducted on various scenarios validates the effectiveness and performance of the proposed FWP model, showcasing its potential for improving resource utilization in a distributed edge computing environment through reputation-based node selection and a resource-aware federated hybrid profiling strategy. Overall, the proposed model successfully selected edge nodes with quick response times, facilitating faster convergence of machine learning algorithms, and enabling more effective, time-critical, and decision-intensive health applications. Although the computational overhead may increase with an increasing number of nodes, our approach prioritizes a lightweight resource profile federation, ensuring that this potential increase does not pose a significant concern given our focus on efficient resource utilization.

Author Contributions

A.N.N. conceived the main conceptual ideas related to the resource aware FPSN, the architecture, the literature review, and the overall implementation and execution of the experiments. H.T.E.K. contributed to the formal modeling, the literature review, and the analysis of the results. M.A.S. contributed to the architecture of the model and the edge resources classification, and he ensured that the study was carried out with utmost care and attention to detail, while overseeing the overall direction and planning. E.S.B. was involved in the general evaluation of the proposed model. All authors contributed to the writing of the manuscript and the revision and proofreading of the final version of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Zayed Health Science Center under fund # 12R005.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset is openly available at UCI Machine Learning Repository, https://archive.ics.uci.edu/ml/datasets/Cardiotocography.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CPU	Central Processing Unit
EWP	Edge Workload Profiling
EWPM	Edge Workload Profiling Module
FL	Federated Learning
FWP	Federated Workload Profiling
FPSN	Federated Patient Similarity Network
GBDT	Gradient Boosted Decision Trees
GPU	Graphics Processing Unit
HFL	Hierarchical Federated Learning
I/O	Input/Output
LightGBM	Light Gradient Boosting Machine
ML	Machine Learning
MCDM	Multiple-Criteria Decision Making Model
NSM	Node Selection Module
NRMA	Node Resource Monitoring Agent
PSN	Patient Similarity Network
QoS	Quality of Service
RPM	Revolutions Per Minute

References

Rieke, N.; Hancox, J.; Li, W.; Milletarì, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. NPJ Digit. Med. 2020, 3, 119. [Google Scholar] [CrossRef]
Abreha, H.G.; Hayajneh, M.; Serhani, M.A. Federated Learning in Edge Computing: A Systematic Survey. Sensors 2022, 22, 450. [Google Scholar] [CrossRef]
Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
Konečny, J.K.; Brendan, H.; Google, M.; Ramage Google, D.; Richtárik, P. Federated Optimization: Distributed Machine Learning for On-Device Intelligence. arXiv 2016, arXiv:1610.02527. [Google Scholar]
Pai, S.; Bader, G.D. Patient Similarity Networks for Precision Medicine. J. Mol. Biol. 2018, 430, 2924–2938. [Google Scholar] [CrossRef]
Parimbelli, E.; Marini, S.; Sacchi, L.; Bellazzi, R. Patient similarity for precision medicine: A systematic review. J. Biomed. Inform. 2018, 83, 87–96. [Google Scholar] [CrossRef]
Brown, S.A. Patient Similarity: Emerging Concepts in Systems and Precision Medicine. Front. Physiol. 2016, 7, 561. [Google Scholar] [CrossRef]
Gottlieb, A.; Stein, G.Y.; Ruppin, E.; Altman, R.B.; Sharan, R. A method for inferring medical diagnoses from patient similarities. BMC Med. 2013, 11, 194. [Google Scholar] [CrossRef]
El Kassabi, H.T.; Adel Serhani, M.; Navaz, A.N.; Ouhbi, S. Federated Patient Similarity Network for Data-Driven Diagnosis of COVID-19 Patients. In Proceedings of the 2021 IEEE/ACS 18th International Conference on Computer Systems and Applications (AICCSA), Tangier, Morocco, 30 December–3 November 2021; pp. 1–6. [Google Scholar] [CrossRef]
Navaz, A.N.; El Kassabi, H.T.; Serhani, M.A.; Oulhaj, A.; Khalil, K. A Novel Patient Similarity Network (PSN) Framework Based on Multi-Model Deep Learning for Precision Medicine. J. Pers. Med. 2022, 12, 768. [Google Scholar] [CrossRef]
Zhan, Y.; Li, P.; Guo, S. Experience-Driven Computational Resource Allocation of Federated Learning by Deep Reinforcement Learning. In Proceedings of the 2020 IEEE 34th International Parallel and Distributed Processing Symposium, IPDPS 2020, New Orleans, LA, USA, 18–22 May 2020; pp. 234–243. [Google Scholar] [CrossRef]
Wang, H.; Kaplan, Z.; Niu, D.; Li, B. Optimizing Federated Learning on Non-IID Data with Reinforcement Learning. Proc. IEEE INFOCOM 2020, 2020, 1698–1707. [Google Scholar] [CrossRef]
Lim, W.Y.B.; Ng, J.S.; Xiong, Z.; Niyato, D.; Miao, C.; Kim, D.I. Dynamic Edge Association and Resource Allocation in Self-Organizing Hierarchical Federated Learning Networks. IEEE J. Sel. Areas Commun. 2021, 39, 3640–3653. [Google Scholar] [CrossRef]
Li, T.; Sanjabi, M.; Beirami, A.; Smith, V. Fair Resource Allocation in Federated Learning. arXiv 2019, arXiv:1905.10497. [Google Scholar]
Xia, W.; Quek, T.Q.; Guo, K.; Wen, W.; Yang, H.H.; Zhu, H. Multi-armed bandit-based client scheduling for federated learning. IEEE Trans. Wirel. Commun. 2020, 19, 7108–7123. [Google Scholar] [CrossRef]
Huang, W.; Han, Z.; Zhao, L.; Xu, H.; Li, Z.; Wang, Z. Resource allocation for intelligent reflecting surfaces assisted federated learning system with imperfect CSI. Algorithms 2021, 14, 363. [Google Scholar] [CrossRef]
Anelli, V.W.; Deldjoo, Y.; Di Noia, T.; Ferrara, A. Towards Effective Device-Aware Federated Learning. Lect. Notes Comput. Sci. 2019, 11946, 477–491. [Google Scholar] [CrossRef]
Chai, Z.; Fayyaz, H.; Fayyaz, Z.; Anwar, A.; Zhou, Y.; Baracaldo, N.; Ludwig, H.; Cheng, Y.; Machine, O.; Opml, L.; et al. Towards Taming the Resource and Data Heterogeneity in Federated Learning. In Proceedings of the 2019 USENIX Conference on Operational Machine Learning (OpML ’19), Santa Clara, CA, USA, 20 May 2019; pp. 1–4. [Google Scholar]
Xu, Z.; Yu, F.; Xiong, J.; Chen, X. ELFISH: Resource-Aware Federated Learning on Heterogeneous Edge Devices. Proc. Des. Autom. Conf. 2021, 2021, 997–1002. [Google Scholar] [CrossRef]
Chai, Z.; Ali, A.; Zawad, S.; Truex, S.; Anwar, A.; Baracaldo, N.; Zhou, Y.; Ludwig, H.; Yan, F.; Cheng, Y. TiFL: A Tier-based Federated Learning System. In Proceedings of the HPDC 2020—29th International Symposium on High-Performance Parallel and Distributed Computing, Stockholm, Sweden, 23–26 June 2020; pp. 125–136. [Google Scholar] [CrossRef]
Khan, L.U.; Saad, W.; Han, Z.; Hong, C.S. Dispersed Federated Learning: Vision, Taxonomy, and Future Directions. IEEE Wirel. Commun. 2021, 28, 192–198. [Google Scholar] [CrossRef]
Pilla, L.L. Optimal task assignment for heterogeneous federated learning devices. In Proceedings of the 2021 IEEE 35th International Parallel and Distributed Processing Symposium, Portland, OR, USA, 17–21 May 2021; pp. 661–670. [Google Scholar] [CrossRef]
Huang, T.; Lin, W.; Wu, W.; He, L.; Li, K.; Zomaya, A.Y. An Efficiency-Boosting Client Selection Scheme for Federated Learning with Fairness Guarantee. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 1552–1564. [Google Scholar] [CrossRef]
Chen, M.; Shlezinger, N.; Poor, H.V.; Eldar, Y.C.; Cui, S. Communication-efficient federated learning. Proc. Natl. Acad. Sci. USA 2021, 118, e2024789118. [Google Scholar] [CrossRef]
Jeon, J.; Park, S.; Choi, M.; Kim, J.; Kwon, Y.B.; Cho, S. Optimal user selection for high-performance and stabilized energy-efficient federated learning platforms. Electronics 2020, 9, 1359. [Google Scholar] [CrossRef]
Albaseer, A.; Abdallah, M.; Al-Fuqaha, A.; Erbad, A. Client Selection Approach in Support of Clustered Federated Learning over Wireless Edge Networks. In Proceedings of the 2021 IEEE Global Communications Conference, GLOBECOM 2021, Madrid, Spain, 7–11 December 2021. [Google Scholar] [CrossRef]
Xu, J.; Wang, H. Client Selection and Bandwidth Allocation in Wireless Federated Learning Networks: A Long-Term Perspective. IEEE Trans. Wirel. Commun. 2021, 20, 1188–1200. [Google Scholar] [CrossRef]
Buyya, R.; Srirama, S.N.; Mahmud, R.; Goudarzi, M.; Ismail, L.; Kostakos, V. Quality of Service (QoS)-Driven Edge Computing and Smart Hospitals: A Vision, Architectural Elements, and Future Directions. In Proceedings of the NIELIT’s International Conference on Communication, Electronics and Digital Technology; Lecture Notes in Networks and Systems Book Series; Springer: Singapore, 2023; Volume 676, pp. 1–23. [Google Scholar] [CrossRef]
Jana, G.C.; Banerjee, S. Enhancement of QoS for fog computing model aspect of robust resource management. In Proceedings of the 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies, ICICICT 2017, Kannur, India, 6–7 July 2017; pp. 1462–1466. [Google Scholar] [CrossRef]
Service, T.; Operation, S. ITU-T E.800-Series: Guidelines on Regulatory Aspects of QoS; ITU-T Publications: Geneva, Switzerland, 2021. [Google Scholar]
Poryazov, S.A.; Saranova, E.T.; Andonov, V.S. Overall Model Normalization towards Adequate Prediction and Presentation of QoE in Overall Telecommunication Systems. In Proceedings of the 2019 14th International Conference on Advanced Technologies, Systems and Services in Telecommunications, TELSIKS 2019, Nis, Serbia, 23–25 October 2019; pp. 360–363. [Google Scholar] [CrossRef]
Ashouri, M.; Lorig, F.; Davidsson, P.; Spalazzese, R. Edge computing simulators for iot system design: An analysis of qualities and metrics. Future Internet 2019, 11, 235. [Google Scholar] [CrossRef]
Aslanpour, M.S.; Gill, S.S.; Toosi, A.N. Performance evaluation metrics for cloud, fog and edge computing: A review, taxonomy, benchmarks and standards for future research. Internet Things 2020, 12, 100273. [Google Scholar] [CrossRef]
Cernak, J.; Cernakova, E.; Kocan, M. Performance testing of distributed computational resources in the software development phase. In Proceedings of the EGI Community Forum 2012/EMI Second Technical Conference (EGICF12-EMITC2), Munich, Germany, 26–30 March 2012. [Google Scholar] [CrossRef]
Shekhar, S.; Gokhale, A. Dynamic resource management across cloud-edge resources for performance-sensitive applications. In Proceedings of the 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2017, Madrid, Spain, 14–17 May 2017; pp. 707–710. [Google Scholar] [CrossRef]
Preeth, E.N.; Mulerickal, J.P.; Paul, B.; Sastri, Y. Evaluation of Docker containers based on hardware utilization. In Proceedings of the 2015 International Conference on Control, Communication and Computing India, ICCC 2015, Trivandrum, India, 19–21 November 2015; pp. 697–700. [Google Scholar] [CrossRef]
Dautov, R.; Distefano, S.; Bruneo, D.; Longo, F.; Merlino, G.; Puliafito, A. Data agility through clustered edge computing and stream processing. Concurr. Comput. Pract. Exp. 2021, 33, 1. [Google Scholar] [CrossRef]
Callou, G.; MacIel, P.; Magnani, F.; Figueiredo, J.; Sousa, E.; Tavares, E.; Silva, B.; Neves, F.; Araujo, C. Estimating sustainability impact, total cost of ownership and dependability metrics on data center infrastructures. In Proceedings of the 2011 IEEE International Symposium on Sustainable Systems and Technology, ISSST 2011, Chicago, IL, USA, 16–18 May 2011. [Google Scholar] [CrossRef]
Dao, N.N.; Lee, J.; Vu, D.N.; Paek, J.; Kim, J.; Cho, S.; Chung, K.S.; Keum, C. Adaptive Resource Balancing for Serviceability Maximization in Fog Radio Access Networks. IEEE Access 2017, 5, 14548–14559. [Google Scholar] [CrossRef]
Zhang, J.; Chen, B.; Zhao, Y.; Cheng, X.; Hu, F. Data Security and Privacy-Preserving in Edge Computing Paradigm: Survey and Open Issues. IEEE Access 2018, 6, 18209–18237. [Google Scholar] [CrossRef]
Shen, X.; Jiang, H.; Chen, Y.; Wang, B.; Gao, L. PLDP-FL: Federated Learning with Personalized Local Differential Privacy. Entropy 2023, 25, 485. [Google Scholar] [CrossRef]
Wang, Y.; Che, T.; Zhao, X.; Zhou, T.; Zhang, K.; Hu, X. A Blockchain-Based Privacy Information Security Sharing Scheme in Industrial Internet of Things. Sensors 2022, 22, 3426. [Google Scholar] [CrossRef]
Alanzi, H.; Alkhatib, M. Towards Improving Privacy and Security of Identity Management Systems Using Blockchain Technology: A Systematic Review. Appl. Sci. 2022, 12, 12415. [Google Scholar] [CrossRef]
Campos, D.; Bernardes, J. Cardiotocography. In UCI Machine Learning Repository; The University of California, Irvine (UCI): Irvine, CA, USA, 2010. [Google Scholar] [CrossRef]
Ayres-de campos, D.; Bernardes, J.; Garrido, A.; Marques-de sá, J.; Pereira-leite, L. SisPorto 2.0: A Program for Automated Analysis of Cardiotocograms. J. Matern.-Fetal Neonatal Med. 2000, 9, 311–318. [Google Scholar] [CrossRef]
Spyder IDE. Available online: https://www.spyder-ide.org/ (accessed on 3 December 2023).
Lai, F.; Zhu, X.; Madhyastha, H.V.; Chowdhury, M. Oort: Efficient federated learning via guided participant selection. In Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2021, Virtual, 14–16 July 2021; pp. 19–35. [Google Scholar]
Li, C.; Zeng, X.; Zhang, M.; Cao, Z. PyramidFL: A Fine-grained Client Selection Framework for Effiicient Federated Learning. In Proceedings of the Annual International Conference on Mobile Computing and Networking, MOBICOM 2022, Sydney, NSW, Australia, 17–21 October 2022; pp. 158–171. [Google Scholar] [CrossRef]
Lian, X.; Zhang, W.; Zhang, C.; Liu, J. Asynchronous decentralized parallel stochastic gradient descent. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 10–15 July 2018; pp. 4745–4767. [Google Scholar]
Luo, Q.; He, J.; Zhuo, Y.; Qian, X. Prague: High-performance heterogeneity-aware asynchronous decentralized training. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems—ASPLOS, Lausanne, Switzerland, 16–20 March 2020; pp. 401–416. [Google Scholar] [CrossRef]

Figure 1. Edge resource metrics classification.

Figure 2. The Federated Workload Profiling model.

Figure 3. Edge Workload Profile.

Figure 4. ML algorithms memory usage comparison.

Figure 5. FWP effect on memory usage on nodes.

Figure 6. Impact of FWP on disk space.

Figure 7. Effect of FWP on network I/O.

Figure 8. Cumulative memory usage before and after FWP.

Figure 9. Cumulative disk space consumption before and after FWP.

Figure 10. Cumulative network I/O Stats before and after FWP.

Figure 11. Execution time before and after FWP.

Figure 12. Effects of FWP on ML Tree Depth and execution time.

Figure 13. FWP enabled FPSN performance.

Table 1. Edge resource metrics.

Edge Resource	Metrics	Description	Measurement Unit
CPU	Processing Speed	The number of instructions per second the computer executes.	MHz
	Utilization	Current system-wide CPU utilization.	Percentage
	Frequency	Include current, minimum and maximum CPU frequencies.	MHz
	Counts	The system’s logical CPU count.	Number
Process	Creation Time	The process’s creation time, since the epoch.	s
	Execution Time	The time spent by the system executing the task.	s
	Timeout	The wait for a process’s PID to terminate.	s
Memory	Available	The amount of RAM that can be made available to programs without forcing the operating system to swap.	bytes
	Cached	Cache memory for numerous purposes.	bytes
	Usage	The amount of memory used.	bytes
	Size	Total physical memory.	bytes
	Shared	The memory that many processes may concurrently access.	bytes
Network	I/O	System-wide network I/O statistics that includes: bytes sent\|bytes received\|packets sent\|packets received.	bytes/packets
	Latency/Delay	The amount of time it takes to transfer data from one destination to another.	s
	Throughput	The number of data packets successfully sent to the destination, which varies depending on the network area. Dropped packets must be retransmitted if throughput is low.	bits per second (bps)
	Bandwidth	Maximum data transfer rate or capacity of a given network for understanding network speed and quality.	bits per second (bps)
Sensors	Battery Level	Battery power left.	Percentage
	Battery Secs Left	Approximate seconds left before the battery runs out of power.	s
	Battery power plugged	The AC power cord is plugged in.	Boolean
	Fans	Hardware fan speed.	RPM
	Temperature	The hardware’s current temperature.	Celsius
Storage	Disk Space	Total disk space.	bytes
Storage	Disk Usage	Disk space used.	bytes

Table 2. Resources and their suggested thresholds.

Edge Resource	Threshold
CPU	80%
Memory	70%
Network	90%
Sensors	95%
Storage	80%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Navaz, A.N.; Kassabi, H.T.E.; Serhani, M.A.; Barka, E.S. Resource-Aware Federated Hybrid Profiling for Edge Node Selection in Federated Patient Similarity Network. Appl. Sci. 2023, 13, 13114. https://doi.org/10.3390/app132413114

AMA Style

Navaz AN, Kassabi HTE, Serhani MA, Barka ES. Resource-Aware Federated Hybrid Profiling for Edge Node Selection in Federated Patient Similarity Network. Applied Sciences. 2023; 13(24):13114. https://doi.org/10.3390/app132413114

Chicago/Turabian Style

Navaz, Alramzana Nujum, Hadeel T. El Kassabi, Mohamed Adel Serhani, and Ezedin S. Barka. 2023. "Resource-Aware Federated Hybrid Profiling for Edge Node Selection in Federated Patient Similarity Network" Applied Sciences 13, no. 24: 13114. https://doi.org/10.3390/app132413114

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Resource-Aware Federated Hybrid Profiling for Edge Node Selection in Federated Patient Similarity Network

Abstract

1. Introduction

2. Related Work

3. Edge Resources Classification

4. Resource-Aware Federated Profiling Model

5. Model Challenges and Feasible Solutions

6. Experimental Evaluation

6.1. Dataset

6.2. Experiment Setup

6.3. Scenarios

6.3.1. Scenario 1: Federated Workload Profiling on Memory Usage

6.3.2. Scenario 2: Federated Workload Profiling on Various Resource Parameters at the Node Level

6.3.3. Scenario 3: The Cumulative Impact of Federated Workload Profiling across Expanding Node Networks

6.3.4. Scenario 4: Federated Resource Profiling on Tree Depth

6.3.5. Scenario 5: Federated Workload Profiling Aiding in Edge Node Selection and FPSN

6.4. Results and Discussion

7. Future Directions

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI