Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Evaluating Task-Level CPU Efficiency for Distributed Stream Processing Systems

Big Data Cogn. Comput. 2023, 7(1), 49; https://doi.org/10.3390/bdcc7010049

by Johannes Rank^1,*

, Jonas Herget¹

, Andreas Hein²

and Helmut Krcmar²

Reviewer 1:

Alejandro Zunino

Reviewer 2:

Jerry Chou

Big Data Cogn. Comput. 2023, 7(1), 49; https://doi.org/10.3390/bdcc7010049

Submission received: 1 January 2023 / Revised: 1 March 2023 / Accepted: 7 March 2023 / Published: 10 March 2023

(This article belongs to the Special Issue Distributed Applications and Services for Future Internet)

Round 1

Reviewer 1 Report

The paper presents a novel technical approach for evaluating CPU Efficiency at the level of tasks using three popular open source distributed stream processing engines. The main contribution of the approach is the highly efficient measurement platform, which relies on the Linux kernel eBPF support. The advantages of eBPF are well known, but in this case, they are fully exploited to measure complex distributed systems. In general the paper is easy to read and despite its technical depth, it is self contained, with enough references and good diagrams. There is, however, one section that is not well integrated with the rest of the paper: Section 3.3 (Formalism). While I understand the goal of this section, it does not help because the little detail it provides can be better described by a short algorithm or diagram. In other words, Section 3.3 should be removed and replaced by a short and more clear description of the same. Besides, the contents of Section 3.3 are not required by the rest of the paper. There is room for improvement in Section 7, where a table summarizing the main observations of the experiments could help the reader to better understand the results. Moreover, all figures in Section 6 are rather small and hard to read.

Bellow I list some minor problems:

p3:

"We evaluate the accuracy": this is confusing. Accuracy for what? Then, in page 4, this gets more confusing: "We evaluated consistency". Please revise.

"using micro-benches" -> "using micro-benchmarks"

p4: "Apaches Spark" -> "Apache Spark."

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

This work presents an approach and implementation that allows to measure CPU efficiency of streaming applications at task-level. The approach is based on eBPF, which allows users to attach small programs to specific kernel functions, in order to monitor and modify the behavior of the kernel. Then combining the monitoring results from eBPF with the stacktrace sampling to calculate the CPU consumption on task-level.

Overall, the monitoring approach based on eBPF is not new. But the discussion on the task-level CPU efficiency for stream processing system is interesting. And the performance evaluation and analysis in Section 5 did give several insight about the performance issue and behavior by comparing different stream processing systems.

The authors gave many details about their implementations. It is quite clear, but also a bit lengthy. Especially for section3.3, it looks more like the definition of some measurements or profiling matrices, instead of a formalism for problem solving.

Finally, besides showing the limited overhead from eBPF, the authors should compare the overhead with the traditional approach in fig5 (or in that subsection). Also, the authors should compare their method with other baseline approaches which can report the task-level measurements. Otherwise, it is hard to judge if it is necessary to use the proposed approach or what are the benefits of using the proposed approach.

The paper can be considered for acceptance after addressing the minor issues mentioned above.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

All my comments have been addressed properly. I have no more questions.

Article Menu

Evaluating Task-Level CPU Efficiency for Distributed Stream Processing Systems

Further Information

Guidelines

MDPI Initiatives

Follow MDPI