Next Article in Journal
Extreme Path Delay Estimation of Critical Paths in Within-Die Process Fluctuations Using Multi-Parameter Distributions
Previous Article in Journal
Efficient Dual Output Regulating Rectifier and Adiabatic Charge Pump for Biomedical Applications Employing Wireless Power Transfer
Previous Article in Special Issue
Self-Parameterized Chaotic Map for Low-Cost Robust Chaos
 
 
Article
Peer-Review Record

DycSe: A Low-Power, Dynamic Reconfiguration Column Streaming-Based Convolution Engine for Resource-Aware Edge AI Accelerators

J. Low Power Electron. Appl. 2023, 13(1), 21; https://doi.org/10.3390/jlpea13010021
by Weison Lin *, Yajun Zhu and Tughrul Arslan
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
J. Low Power Electron. Appl. 2023, 13(1), 21; https://doi.org/10.3390/jlpea13010021
Submission received: 9 January 2023 / Revised: 7 March 2023 / Accepted: 9 March 2023 / Published: 16 March 2023
(This article belongs to the Special Issue Low-Power Computation at the Edge)

Round 1

Reviewer 1 Report

This paper is practical for Edge AI Accelerators (from [5], [6]) but I have the flowing comments.

1. This paper combined the authors’ prior works [5] & [6] with more explanations and several scaled DrCSC design outcomes. However, there seemed to be no new or enhanced design/method contents in this work for the journal paper. (The extensions of conference papers should be enough for journal submission.)

2.    There were several arts for weight/data mapping works (e.g., “Efficient Mobile Implementation of A CNN-based Object Recognition System ” (ACM), “VW-SDK: Efficient Convolutional Weight Mapping Using Variable Windows for Processing-In-Memory Architectures” (DATE)), the only comparison or discussion for [7] were too weak in a journal paper.

3.    The claim of “low-power” was not convincing. The power compared with [7] only discussed in lines 237-243 and lines 500-505, but the technical reasons of the authors’ designs being low-power was not clear. By the way, the contents of lines 237-243 were almost a copy from [5].

4.    There were many FPGA works for Edge AI Accelerators and the hardware comparisons were also too weak (only [7]) for a journal paper.

5.   The power results seemed different with the data in [6] and the condition was not clear.

Author Response

Dear Reviewer,

Please see the attached for the response.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper describes a dynamic reconfigurable column streaming convolution approach validated using the Xilinx Vivado synthesis tool. The paper is well written, in general, it is a little bit long, mostly due to detailed descriptions and numerous figures.

There are however some comments that could help improving the work.

The paper refers extensively the self-citation [4], during the introduction. It would be preferrable to avoid such self-citation so many times, focusing more other works.

Several approaches are identified in section 4, but it the key idea should be clear among the existent strategies, instead of a vague description,
i.e. prior to further detailing 4.1.

It should be clear what is added to the pilot works [5] and [6] along the text.

Why is "method" in italic?

Figures (eg. 13) should identify the number of bits used (e.g. in the multiplier).

There are minor typos, mainly lack of spaces in the text that should be corrected.

It should be discussed if the gains in power are scalable with the number of PEs.







Author Response

Dear Reviewer,

Please see the attached for the response.

Author Response File: Author Response.pdf

Reviewer 3 Report

The authors have presented a dynamic reconfigurable column streaming-based convolution engine with programmable adders for machine learning acceleration at the edge. The hardware architecture, data flow, building block design, etc have been explained well.

However, the results presented are from FPGA using Vivado HLS. While the motivation for this design has been mentioned as low power consumption, high performance and low resource consumption, these metrics are not clear from the results presented. The previous designs mentioned in Table 1 are all ASIC designs. So, it is necessary to present at least some ASIC synthesis results for the proposed design in order to have a fair comparison of power, area, performance, etc. Otherwise, detailed comparison with previous FPGA designs have to be presented. To summarize, the implementation aspect must be revised and the comparison with previous literature must be fair and more comprehensive.

Author Response

Dear Reviewer,

Please see the attached for the response.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I have no more comments.

Author Response

Thanks to the reviewer for examing our response.

Reviewer 3 Report

The authors have addressed most of the previous comments.

However, the concern regarding comparison with previous work is still not addressed. Although this is an FPGA-based implementation, comparison with previous ASIC designs should be done since low-power is a key motivation for this work. Also, there are many more FPGA-based designs in literature. Hence, Table 4 must be expanded to include more previous designs for a fair and comprehensive analysis.

Author Response

Dear Reviewer,

We have expanded the Table 4 and justified the motivation 'low-power' in the attachment. For the detail, please see the letter.

Author Response File: Author Response.pdf

Back to TopTop