Submit to Special Issue Submit Abstract to Special Issue Review for Electronics Propose a Special Issue

Journal Menu

Journal Browser

FPGA-Based Deep Neural Network Accelerators Using Emerging Technologies

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Circuit and Signal Processing".

Deadline for manuscript submissions: 20 October 2024 | Viewed by 2782

Share This Special Issue

Special Issue Editor

Dr. Chen Yang

E-Mail Website
Guest Editor

School of microelectronics, Xi’an Jiaotong University, Xi’an 710049, China
Interests: neural network accelerator; reconfigurable computing; VLSI SoC design
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Electronics invites manuscript submissions in the area of deep neural network (DNN) acceleration, such as fast convolution algorithms, sparsification, low-bit quantization, approximate computing and other emerging technologies or models, which are applied to implement FPGA-based deep learning accelerators. In recent years, DNNs have been widely used in many rising fields, such as computer-vision, object segmentation and autonomous driving, achieving excellent performance. Due to its excellent power consumption and configurability, FPGA is most commonly used to implement DNN accelerators. However, the huge parameters and computation workloads, as well as the rapid evolution of DNN models, make it difficult to deploy them in scenarios with tight limits on FPGA resources and requirements for high performance. Some emerging technologies have the potential to achieve computation reduction and runtime latency degradation. Many open issues still remain unresolved when those emerging and promising technologies are applied for hardware acceleration, such as how to extend the fast convolution algorithm to different convolution types with reduced hardware resources, how to efficiently eliminate the invalid zero elements in sparse DNNs and how to support different computation parallelism for various low-bit schemes with stable processing element utilization.

To tackle the above issues and challenges, this Special Issue of Electronics will advance innovative solutions and novel advancements in the field of DNN acceleration, sparsification, low-bit quantization, approximate computing, etc. Additionally, this field also looks forward to brand new ideas (e.g., other fast convolution algorithms and novel pruning strategies) and subversive FPGA-based accelerators on other emerging networks (e.g., graph neural networks, spike neural networks) to help solve the aforementioned problems and challenges.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but not limited to) the following:

Novel architecture for FPGA-based DNN accelerators;
Hardware/software co-design for FPGA-based DNN accelerators;
Resource/bandwidth optimizations for FPGA-based DNN accelerators;
FPGA-based DNN accelerators using fast convolution algorithms;
FPGA-based accelerators for sparse DNNs;
FPGA-based DNN accelerators performing low-bit/mixed-bit quantization;
FPGA-based DNN accelerators using approximate computing;
Dynamically reconfigurable DNN accelerators;
FPGA-based graph convolution neural network acceleration;
FPGA-based spike neural network acceleration;
FPGA-based transformer network acceleration;
FPGA-based acceleration of other atypical convolutions/networks;
Emerging applications of FPGA-based DNN accelerators.

Dr. Chen Yang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

deep learning
convolution neural network
fast convolution algorithm
sparsification
low-bit quantization
approximate computing
reconfigurable computing
FPGA

Published Papers (3 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

24 pages, 10357 KiB

Open AccessArticle

Design of a Generic Dynamically Reconfigurable Convolutional Neural Network Accelerator with Optimal Balance

by Haoran Tong, Ke Han, Si Han and Yingqi Luo

Electronics 2024, 13(4), 761; https://doi.org/10.3390/electronics13040761 - 14 Feb 2024

Viewed by 578

Abstract

In many scenarios, edge devices perform computations for applications such as target detection and tracking, multimodal sensor fusion, low-light image enhancement, and image segmentation. There is an increasing trend of deploying and running multiple different network models on one hardware platform, but there is a lack of generic acceleration architectures that support standard convolution (CONV), depthwise separable CONV, and deconvolution (DeCONV) layers in such complex scenarios. In response, this paper proposes a more versatile dynamically reconfigurable CNN accelerator with a highly unified computing scheme. The proposed design, which is compatible with standard CNNs, lightweight CNNs, and CNNs with DeCONV layers, further improves the resource utilization and reduces the gap of efficiency when deploying different models. Thus, the hardware balance during the alternating execution of multiple models is enhanced. Compared to a state-of-the-art CNN accelerator, Xilinx DPU B4096, our optimized architecture achieves resource utilization improvements of 1.08× for VGG16 and 1.77× for MobileNetV1 in inference tasks on the Xilinx ZCU102 platform. The resource utilization and efficiency degradation between these two models are reduced to 59.6% and 63.7%, respectively. Furthermore, the proposed architecture can properly run DeCONV layers and demonstrates good performance. Full article

(This article belongs to the Special Issue FPGA-Based Deep Neural Network Accelerators Using Emerging Technologies)

► Show Figures

Figure 1

17 pages, 3032 KiB

Open AccessArticle

WRA-MF: A Bit-Level Convolutional-Weight-Decomposition Approach to Improve Parallel Computing Efficiency for Winograd-Based CNN Acceleration

by Siwei Xiang, Xianxian Lv, Yishuo Meng, Jianfei Wang, Cimang Lu and Chen Yang

Electronics 2023, 12(24), 4943; https://doi.org/10.3390/electronics12244943 - 08 Dec 2023

Viewed by 687

Abstract

FPGA-based convolutional neural network (CNN) accelerators have been extensively studied recently. To exploit the parallelism of multiplier–accumulator computation in convolution, most FPGA-based CNN accelerators heavily depend on the number of on-chip DSP blocks in the FPGA. Consequently, the performance of the accelerators is restricted by the limitation of the DSPs, leading to an imbalance in the utilization of other FPGA resources. This work proposes a multiplication-free convolutional acceleration scheme (named WRA-MF) to relax the pressure on the required DSP resources. Firstly, the proposed WRA-MF employs the Winograd algorithm to reduce the computational density, and it then performs bit-level convolutional weight decomposition to eliminate the multiplication operations. Furthermore, by extracting common factors, the complexity of the addition operations is reduced. Experimental results on the Xilinx XCVU9P platform show that the WRA-MF can achieve 7559 GOP/s throughput at a 509 MHz clock frequency for VGG16. Compared with state-of-the-art works, the WRA-MF achieves up to a 3.47×–27.55× area efficiency improvement. The results indicate that the proposed architecture achieves a high area efficiency while ameliorating the imbalance in the resource utilization. Full article

(This article belongs to the Special Issue FPGA-Based Deep Neural Network Accelerators Using Emerging Technologies)

► Show Figures

Figure 1

12 pages, 516 KiB

Open AccessArticle

Timing-Driven Simulated Annealing for FPGA Placement in Neural Network Realization

by Le Yu and Baojin Guo

Electronics 2023, 12(17), 3562; https://doi.org/10.3390/electronics12173562 - 23 Aug 2023

Cited by 1 | Viewed by 984

Abstract

The simulated annealing algorithm is an extensively utilized heuristic method for heterogeneous FPGA placement. As the application of neural network models on FPGAs proliferates, new challenges emerge for the traditional simulated annealing algorithm in terms of timing. These challenges stem from large circuit sizes and high heterogeneity in the block proportions typical in neural networks. To address these challenges, this study introduces a timing-driven simulated annealing placement algorithm. This algorithm integrates cluster criticality identification during the cluster selection phase, which enhances the probability of high-criticality cluster selection. In the cluster movement phase, the proposed method employs an improved weighted center movement for high-criticality clusters and a random movement strategy for other clusters. Experimental evidence demonstrates that the proposed placement algorithm decreases the average wire length by 1.52% and the average critical path delay by 5.03%. This improvement in performance is achieved with a marginal increase of 5.01% in runtime, as compared to VTR8.0. Full article

(This article belongs to the Special Issue FPGA-Based Deep Neural Network Accelerators Using Emerging Technologies)

► Show Figures

Journal Menu

Journal Browser

FPGA-Based Deep Neural Network Accelerators Using Emerging Technologies

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (3 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI