Next Article in Journal
Multi-Physics and Multi-Objective Optimization for Fixing Cubic Fabry–Pérot Cavities Based on Data Learning
Previous Article in Journal
Field Test Study of Performance of Bored Piles in Collapsible Loess
 
 
Article
Peer-Review Record

Resource-Aware Federated Hybrid Profiling for Edge Node Selection in Federated Patient Similarity Network

Appl. Sci. 2023, 13(24), 13114; https://doi.org/10.3390/app132413114
by Alramzana Nujum Navaz 1,*, Hadeel T. El Kassabi 2, Mohamed Adel Serhani 3 and Ezedin S. Barka 4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Appl. Sci. 2023, 13(24), 13114; https://doi.org/10.3390/app132413114
Submission received: 6 October 2023 / Revised: 28 November 2023 / Accepted: 4 December 2023 / Published: 8 December 2023

Round 1

Reviewer 1 Report (New Reviewer)

Comments and Suggestions for Authors

The paper proposes a resource-aware federated hybrid profiling approach, that measures the available static and dynamic resources of the edge nodes. The dynamic resources are continuously monitored for each workload at the edge, and resource utilization patterns are used to construct the Edge Workload Profile (EWP).

The abstract is too long and unclear. It must be thoroughly revised and shorten. The authors should clearly state what is the area of research and what are the contributions of the paper without getting too much into details.

In the introduction, the most important notions such as federated learning, Federated Resource Profiling, QoS, Federated Optimization, etc. are briefly explained.  Some notions should be explained in more detail. For instance, the arguably most important aspect of every service system in general (and telecommunication network in particular) is the Quality of Service (QoS)/Quality of Experience (QoE). I recommend to the authors to explain in more details the notions of QoS/QoE and to include references to the documents International Telecommunication Union for example the QoS regulations (ITU-T Supp. 9 of E.800 Series),  the vocabulary for performance, quality of service and quality of experience, etc.

Also, in recent years there are studies on QoS and QoE in heterogenous networks, addressing important problems related to determining the QoS. For example, an overall normalization approach for determining the QoS in overall telecommunication systems is described in:

S. A. Poryazov, et al., "Overall Model Normalization towards Adequate Prediction and Presentation of QoE in Overall Telecommunication Systems," 2019 14th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS), Nis, Serbia, 2019, pp. 360-363, doi: 10.1109/TELSIKS46999.2019.9002295.

The proposed approach successfully solves QoS estimation problems  related to heterogenous networks and the need for scalability of the QoS indicators.

The paper

Huang, W.; Han, Z.; Zhao, L.; Xu, H.; Li, Z.; Wang, Z. Resource Allocation for Intelligent Reflecting Surfaces Assisted Federated Learning System with Imperfect CSI. Algorithms 2021, 14, 363.

studies the problem of computation and communication resource allocation of the intelligent reflective surface-assisted federated learning (FL) system based on the imperfect chnnel state information (CSI). Specifically, we take the statistical CSI error model into consideration and formulate the training time minimization problem subject to the rate outage probability constraints. In order to solve this issue, the semi-definite relaxation (SDR) and the constrained concave convex procedure (CCCP) are invoked to transform it into a convex problem. Subsequently, a low-complexity algorithm is proposed to minimize the delay of the FL system.

The mathematical framework is correctly described.

The numerical experiments verify the models.

Overall, this is an excellent paper. I recommend that it be published after a minor revision related to the related works section and the notion of QoS and the problems related to its estimation.

Comments on the Quality of English Language

Minor editting is required. There are some unclear phrases and sentences.

 

Author Response

Resource-aware Federated Hybrid Profiling for Edge Node Selection in Federated PSN (FPSN)

Manuscript ID - applsci-2675559

 

Reviewer 1

The paper proposes a resource-aware federated hybrid profiling approach, that measures the available static and dynamic resources of the edge nodes. The dynamic resources are continuously monitored for each workload at the edge, and resource utilization patterns are used to construct the Edge Workload Profile (EWP).

The abstract is too long and unclear. It must be thoroughly revised and shorten. The authors should clearly state what is the area of research and what are the contributions of the paper without getting too much into details.

Response

We appreciate your feedback, and we agree with the suggestion to revise and shorten the abstract. We have revised the abstract and presented a clearer and more concise statement of the research area and the contributions of the paper.

In the introduction, the most important notions such as federated learning, Federated Resource Profiling, QoS, Federated Optimization, etc. are briefly explained.  Some notions should be explained in more detail. For instance, the arguably most important aspect of every service system in general (and telecommunication network in particular) is the Quality of Service (QoS)/Quality of Experience (QoE). I recommend to the authors to explain in more details the notions of QoS/QoE and to include references to the documents International Telecommunication Union for example the QoS regulations (ITU-T Supp. 9 of E.800 Series), the vocabulary for performance, quality of service and quality of experience, etc.

Also, in recent years there are studies on QoS and QoE in heterogenous networks, addressing important problems related to determining the QoS. For example, an overall normalization approach for determining the QoS in overall telecommunication systems is described in:

  1. A. Poryazov, et al., "Overall Model Normalization towards Adequate Prediction and Presentation of QoE in Overall Telecommunication Systems," 2019 14th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS), Nis, Serbia, 2019, pp. 360-363, doi: 10.1109/TELSIKS46999.2019.9002295.

The proposed approach successfully solves QoS estimation problems related to heterogenous networks and the need for scalability of the QoS indicators.

The paper

Huang, W.; Han, Z.; Zhao, L.; Xu, H.; Li, Z.; Wang, Z. Resource Allocation for Intelligent Reflecting Surfaces Assisted Federated Learning System with Imperfect CSI. Algorithms 2021, 14, 363.

studies the problem of computation and communication resource allocation of the intelligent reflective surface-assisted federated learning (FL) system based on the imperfect channel state information (CSI). Specifically, we take the statistical CSI error model into consideration and formulate the training time minimization problem subject to the rate outage probability constraints. In order to solve this issue, the semi-definite relaxation (SDR) and the constrained concave convex procedure (CCCP) are invoked to transform it into a convex problem. Subsequently, a low-complexity algorithm is proposed to minimize the delay of the FL system.

Response

Thank you for your valuable feedback on QoS and QoE. We have added a new subsection (Section D) to the Related Work in our article, thoroughly addressing these concepts as per your suggestions.

The mathematical framework is correctly described.

The numerical experiments verify the models.

Overall, this is an excellent paper. I recommend that it be published after a minor revision related to the related works section and the notion of QoS and the problems related to its estimation.

Response

We greatly appreciate the positive feedback and recommendation for publication. We have addressed the minor revision related to the related works section and provided a more detailed explanation of QoS as suggested by the reviewer.

Comments on the Quality of English Language

Minor editing is required. There are some unclear phrases and sentences.

Response

We have carefully reviewed and edited the manuscript to clarify and refine the language for enhanced clarity and coherence. Your insightful inputs have significantly improved the manuscript, and we sincerely thank you for your valuable guidance.

 

 

 

 

 

 

 

Author Response File: Author Response.pdf

Reviewer 2 Report (New Reviewer)

Comments and Suggestions for Authors

1.    Summary of Contributions

    This paper makes a noteworthy contribution to the domain of edge computing by addressing the computational straggler issues that arise from the resource heterogeneity of edge nodes. The research presents a resource-aware federated hybrid profiling approach, which adeptly identifies and measures static and dynamic resource capabilities at the edge to optimize Quality of Service (QoS) and performance in real-time eHealth applications. By constructing an Edge Workload Profile and implementing Federated Workload Profiling, the study enables the selection of edge nodes best suited for Federated Patient Similarity Network models, ensuring efficient processing and timely updates. The proposed model's efficacy is demonstrated through experiments that evaluate edge performance metrics, proving its potential to enhance resource utilization and support time-critical health applications. The approach not only promises faster convergence of machine learning algorithms through strategic node selection but also maintains a lightweight profile federation, minimizing computational overhead as the network scales. This research lays the groundwork for future advancements that could further improve collaboration between nodes and resource distribution, with an emphasis on extending the model to facilitate resource offloading for enhanced scalability and efficiency in decentralized healthcare scenarios.

 

2.    Weaknesses & Suggestions

(1)  Prior research has explored methods for selecting optimal nodes to accelerate and enhance the training process, leading to improved model performance. For instance, works like Oort [1] and PyramidFL [2] have introduced utility functions that leverage statistical and system characteristics to effectively choose nodes. These approaches have demonstrated their capacity to reduce communication costs while maintaining high performance and have been validated in large-scale federated learning environments through detailed experiments. However, it's worth noting that this paper primarily addresses system heterogeneity by considering different types of edge resources while not explicitly addressing statistical heterogeneity, a common phenomenon in federated learning.

(2)  In this paper, it is mentioned that the thresholds for dynamic adjustment can be set based on workload and resource utilization patterns over time (lines 259 & 260). However, the specific mechanism for dynamically setting these thresholds is not elaborated upon, which leaves room for clarification. A more detailed explanation of the dynamic threshold adjustment process would enhance the practical applicability of the proposed method. Failing to design an adaptive mechanism for threshold adjustment could potentially limit the suitability of this approach for real-world applications.

(3)  The paper mentions that the dataset consists of 2,126 Cardiotocogram instances with 23 attributes and that the entire Fetal Health dataset is divided into six datasets for the six edge nodes in the experimental setup (lines 428,428,429,430). It appears that the number of training samples may be limited, and the distribution of data among all nodes could be assumed to be Independent and Identically distributed (IID). These experimental conditions might not provide sufficient evidence to fully support the efficiency of the proposed design. Furthermore, the omission of addressing data heterogeneity could undermine the conclusions regarding the effectiveness and scalability of the proposed approach in real-world scenarios.

(4) While the paper provides some discussions about computational stragglers, it lacks experiments or in-depth discussions on how the proposed design effectively addresses this issue. While the proposed approach can potentially enhance training efficiency, it remains unclear how it deals with straggler situations. To improve the paper, I suggest the authors consider referencing works such as AD-PSGD [3] and Prague [4] and conducting additional experiments and discussions to address this aspect in more detail.

(5) Expanding the scope of this study to include various datasets and employing a variety of state of the arts in the experimental analysis would significantly enhance the robustness and comprehensiveness of the designs.

 

[1] Lai Fan et al. Oort: Efficient federated learning via guided participant selection. In Proc. of OSDI, 2021.

[2] Li C et al. PyramidFL: A fine-grained client selection framework for efficient federated learning. In Proc. of ACM MobiCom, 2022.

[3] Xiangru Lian et al. Asynchronous decentralized parallel stochastic gradient descent. In Proc. of ICML, 2018.

[4] Qinyi Luo et al. Prague: High-performance heterogeneity-aware asynchronous decentralized training. In Proc. of ASPLOS, 2020.

Author Response

Resource-aware Federated Hybrid Profiling for Edge Node Selection in Federated PSN (FPSN)

Manuscript ID - applsci-2675559

 

Reviewer 2

 

  1. Summary of Contributions

    This paper makes a noteworthy contribution to the domain of edge computing by addressing the computational straggler issues that arise from the resource heterogeneity of edge nodes. The research presents a resource-aware federated hybrid profiling approach, which adeptly identifies and measures static and dynamic resource capabilities at the edge to optimize Quality of Service (QoS) and performance in real-time eHealth applications. By constructing an Edge Workload Profile and implementing Federated Workload Profiling, the study enables the selection of edge nodes best suited for Federated Patient Similarity Network models, ensuring efficient processing and timely updates. The proposed model's efficacy is demonstrated through experiments that evaluate edge performance metrics, proving its potential to enhance resource utilization and support time-critical health applications. The approach not only promises faster convergence of machine learning algorithms through strategic node selection but also maintains a lightweight profile federation, minimizing computational overhead as the network scales. This research lays the groundwork for future advancements that could further improve collaboration between nodes and resource distribution, with an emphasis on extending the model to facilitate resource offloading for enhanced scalability and efficiency in decentralized healthcare scenarios.

 

  1. Weaknesses & Suggestions

(1)  Prior research has explored methods for selecting optimal nodes to accelerate and enhance the training process, leading to improved model performance. For instance, works like Oort [1] and PyramidFL [2] have introduced utility functions that leverage statistical and system characteristics to effectively choose nodes. These approaches have demonstrated their capacity to reduce communication costs while maintaining high performance and have been validated in large-scale federated learning environments through detailed experiments. However, it's worth noting that this paper primarily addresses system heterogeneity by considering different types of edge resources while not explicitly addressing statistical heterogeneity, a common phenomenon in federated learning.

(2)  In this paper, it is mentioned that the thresholds for dynamic adjustment can be set based on workload and resource utilization patterns over time (lines 259 & 260). However, the specific mechanism for dynamically setting these thresholds is not elaborated upon, which leaves room for clarification. A more detailed explanation of the dynamic threshold adjustment process would enhance the practical applicability of the proposed method. Failing to design an adaptive mechanism for threshold adjustment could potentially limit the suitability of this approach for real-world applications.

 

Response

 

Thank you for your valuable feedback on the dynamic threshold adjustment process. We have addressed your concerns by adding detailed explanation as given below with a practical use case, that we believe significantly enhances its clarity and applicability.

Resource profiling will initially rely on expert-set thresholds, using their deep understanding of system capabilities and expected workloads. Over time, as the system accumulates and analyzes historical data on resource usage, these thresholds can be refined to more accurately reflect the real-time demands and efficiency requirements. This evolving process allows the system to dynamically adapt its resource allocation, enhancing performance and resource management based on actual usage patterns. For instance, in a cloud-based service use case, the system initially operates with expert-set CPU and network bandwidth thresholds. However, it observes that during peak business hours, CPU usage consistently exceeds these thresholds, leading to performance bottlenecks. By dynamically adjusting the CPU threshold based on this historical data, the system can allocate more processing power during these peak hours, thereby reducing latency and enhancing user experience. This practical utility demonstrates how adaptive resource profiling can respond to fluctuating workload demands, ensuring optimal resource utilization and system performance.

 

 

(3)  The paper mentions that the dataset consists of 2,126 Cardiotocogram instances with 23 attributes and that the entire Fetal Health dataset is divided into six datasets for the six edge nodes in the experimental setup (lines 428,428,429,430). It appears that the number of training samples may be limited, and the distribution of data among all nodes could be assumed to be Independent and Identically distributed (IID). These experimental conditions might not provide sufficient evidence to fully support the efficiency of the proposed design. Furthermore, the omission of addressing data heterogeneity could undermine the conclusions regarding the effectiveness and scalability of the proposed approach in real-world scenarios.

(4) While the paper provides some discussions about computational stragglers, it lacks experiments or in-depth discussions on how the proposed design effectively addresses this issue. While the proposed approach can potentially enhance training efficiency, it remains unclear how it deals with straggler situations. To improve the paper, I suggest the authors consider referencing works such as AD-PSGD [3] and Prague [4] and conducting additional experiments and discussions to address this aspect in more detail.

(5) Expanding the scope of this study to include various datasets and employing a variety of state of the arts in the experimental analysis would significantly enhance the robustness and comprehensiveness of the designs.

 

[1] Lai Fan et al. Oort: Efficient federated learning via guided participant selection. In Proc. of OSDI, 2021.

[2] Li C et al. PyramidFL: A fine-grained client selection framework for efficient federated learning. In Proc. of ACM MobiCom, 2022.

[3] Xiangru Lian et al. Asynchronous decentralized parallel stochastic gradient descent. In Proc. of ICML, 2018.

[4] Qinyi Luo et al. Prague: High-performance heterogeneity-aware asynchronous decentralized training. In Proc. of ASPLOS, 2020.

 

Response

 

Thank you for your insightful and comprehensive review. We have taken your feedback into consideration and have incorporated points 1, 3, and 4 into a new section titled "Future Directions" in our paper. This section addresses the need for further exploration into optimal node selection, the importance of considering data heterogeneity in federated learning, and the necessity of detailed analysis on handling computational stragglers. We believe these additions will significantly enhance the depth and scope of our research. Please find the newly added section below.

  1. Future Directions

As part of our ongoing efforts to enhance FL systems, we will focus on several key areas of development that align with the evolving challenges in the field. Our future research will concentrate on the following advancements:

Integrating Statistical Heterogeneity and Optimizing Edge Reputation in FL:

In our future work, we aim to incorporate statistical heterogeneity into our resource-aware federated profiling framework, drawing inspiration from methodologies in studies such as Oort [46] and PyramidFL [47] This will enhance our model's adaptability in environments with diverse data distributions across nodes. Simultaneously, we plan to optimize the edge reputation factor, which assesses the historical performance and contribution of nodes. This optimization will include integrating advanced utility profiling mechanisms that adapt to the changing performance and contributions of nodes over time, ensuring more effective and reliable node selection.

Addressing Computational Stragglers in Federated Learning:

We intend to tackle the challenge of computational stragglers, employing strategies such as AD-PSGD algorithm [48] and Prague [49], for efficient information exchange and robust asynchronous operations. AD-PSGD algorithm [48], aims to enhance the overall efficiency, particularly in dealing with nodes that exhibit slower or irregular computational performance. The Prague algorithm [49], a high-performance, heterogeneity-aware training method that enhances synchronization efficiency and reduces conflicts, using advanced techniques like Partial All-Reduce and static group scheduling make it an ideal model for optimizing federated learning systems and managing computational stragglers.

Expanding Experimentation with Diverse Datasets:

Our experiments are currently limited to a specific, homogeneous dataset, highlighting the need for future research to include more varied and heterogeneous data to comprehensively assess the proposed design's effectiveness and scalability in diverse real-world scenarios.

To validate and refine our approach, in future we will employ a range of datasets in our experiments. This will enable us to assess the robustness and scalability of our design under different conditions and conduct comprehensive comparisons with state-of-the-art resource optimization strategies and systems.

Enhancing Fine-grained Client Selection:

We will also explore fine-grained client selection strategies to improve time-to-accuracy performance in FL systems. A key technique we plan to incorporate is client clustering based on data characteristics. This method involves grouping clients with similar data distributions or computational capabilities, allowing for more targeted and efficient model updates. This approach will focus on exploiting data and system heterogeneity within selected clients for more efficient utility profiling, leading to improved model training efficiency and effectiveness.

In addition to the areas mentioned, further research is critical to fully evaluate the efficacy and scalability of our system in real-world scenarios, particularly in addressing privacy and security concerns. As improving node collaboration and resource utilization remains a top priority, future developments will focus on enhancing the FWP to allow for resource offloading to resource-rich nodes. By doing so, we aim to maximize the efficiency of resource allocation across the network, ensuring optimal utilization of available resources while maintaining system integrity and security.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report (New Reviewer)

Comments and Suggestions for Authors

The revised version enriches the discussion by delving deeper into design aspects, comparing with state-of-the-art methodologies, and exploring various datasets. These additions somewhat address my initial concerns. I eagerly anticipate further developments in this field.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have not adequately addressed the comments of the reviewer. The authors rather tried to defend this incomplete manuscript by their previously published work. It is just a continuation of their previous works.  

 

 

Back to TopTop