FPGA-Based Accelerators of Deep Learning and Neuromorphic Computing

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (15 May 2023) | Viewed by 5938

Special Issue Editor

Institute for Artificial Intelligence, School of Integrated Circuits, Peking University, Beijing 100871, China
Interests: FPGA hardware system; deep learning acceleration; energy-efficient VLSI design
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In the last decade, deep learning and neuromorphic computing have demonstrated great success in various tasks of artificial intelligence (AI), such as computer vision and natural language processing. However, the ever-increasing computation complexity and memory requirement of evolving AI algorithms are still challenging the state-of-the-art computing platforms, which make it difficult to achieve high performance and high energy efficiency. Meanwhile, neuromorphic computing inspired by brain’s biological neural network is rapidly developing as a promising solution to achieve ultra-low power and mimic human intelligence. Field-programmable gate arrays (FPGAs) have attracted increasing interest and popularity to accelerate deep learning and neuromorphic algorithms on both edge and cloud, thanks to their 1) high reconfigurability, 2) fast deployment, 3) customized architecture, and 4) software and hardware co-design of SoC with embedded CPU.

This Special Issue will cover advanced techniques for the hardware acceleration of deep learning and neuromorphic algorithms on FPGA, ranging from micro-architecture design to automatic compilation, as well as hardware-friendly algorithm optimization, including the latest ongoing research efforts in these fields but not limited to:

  1. Algorithm–hardware co-design for FPGA-based intelligent acceleration;
  2. System and software for FPGA accelerator compilation;
  3. Reconfigurable/adaptive computing for AI/ML;
  4. FPGA-based rapid prototyping of AI/ML system;
  5. Programmable neuromorphic computing architectures on FPGAs;
  6. Implementation of novel intelligent applications on FPGAs;
  7. AI/ML systems based on coarse-grained reconfigurable architectures (CGRAs);
  8. Implementation and evaluation of spiking neural networks on FPGAs.

Dr. Yufei Ma
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • FPGA accelerator
  • algorithm–hardware co-design
  • deep neural networks
  • neuromorphic computing

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 1015 KiB  
Article
A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator
by Jiye Huang, Xin Liu, Tongdong Guo and Zhijin Zhao
Electronics 2023, 12(7), 1571; https://doi.org/10.3390/electronics12071571 - 27 Mar 2023
Cited by 1 | Viewed by 1411
Abstract
Depthwise separable convolution (DSC) significantly reduces parameter and floating operations with an acceptable loss of accuracy and has been widely used in various lightweight convolutional neural network (CNN) models. In practical applications, however, DSC accelerators based on graphics processing units (GPUs) cannot fully [...] Read more.
Depthwise separable convolution (DSC) significantly reduces parameter and floating operations with an acceptable loss of accuracy and has been widely used in various lightweight convolutional neural network (CNN) models. In practical applications, however, DSC accelerators based on graphics processing units (GPUs) cannot fully exploit the performance of DSC and are unsuitable for mobile application scenarios. Moreover, low resource utilization due to idle engines is a common problem in DSC accelerator design. In this paper, a high-performance DSC hardware accelerator based on field-programmable gate arrays (FPGAs) is proposed. A highly reusable and scalable multiplication and accumulation engine is proposed to improve the utilization of computational resources. An efficient convolution algorithm is proposed for depthwise convolution (DWC) and pointwise convolution (PWC), respectively, to reduce the on-chip memory occupancy. Meanwhile, the proposed convolution algorithms achieve partial fusion between PWC and DWC, and improve the off-chip memory access efficiency. To maximise bandwidth utilization and reduce latency when reading feature maps, an address mapping method for off-chip accesses is proposed. The performance of the proposed accelerator is demonstrated by implementing MobileNetV2 on an Intel Arria 10 GX660 FPGA by using Verilog HDL. The experimental results show that the proposed DSC accelerator achieves a performance of 205.1 FPS, 128.8 GFLOPS, and 0.24 GOPS/DSP for input images of size 224×224×3. Full article
(This article belongs to the Special Issue FPGA-Based Accelerators of Deep Learning and Neuromorphic Computing)
Show Figures

Figure 1

15 pages, 2573 KiB  
Article
F-LSTM: FPGA-Based Heterogeneous Computing Framework for Deploying LSTM-Based Algorithms
by Bushun Liang, Siye Wang, Yeqin Huang, Yiling Liu and Linpeng Ma
Electronics 2023, 12(5), 1139; https://doi.org/10.3390/electronics12051139 - 26 Feb 2023
Cited by 5 | Viewed by 2225
Abstract
Long Short-Term Memory (LSTM) networks have been widely used to solve sequence modeling problems. For researchers, using LSTM networks as the core and combining it with pre-processing and post-processing to build complete algorithms is a general solution for solving sequence problems. As an [...] Read more.
Long Short-Term Memory (LSTM) networks have been widely used to solve sequence modeling problems. For researchers, using LSTM networks as the core and combining it with pre-processing and post-processing to build complete algorithms is a general solution for solving sequence problems. As an ideal hardware platform for LSTM network inference, Field Programmable Gate Array (FPGA) with low power consumption and low latency characteristics can accelerate the execution of algorithms. However, implementing LSTM networks on FPGA requires specialized hardware and software knowledge and optimization skills, which is a challenge for researchers. To reduce the difficulty of deploying LSTM networks on FPGAs, we propose F-LSTM, an FPGA-based framework for heterogeneous computing. With the help of F-LSTM, researchers can quickly deploy LSTM-based algorithms to heterogeneous computing platforms. FPGA in the platform will automatically take up the computation of the LSTM network in the algorithm. At the same time, the CPU will perform the pre-processing and post-processing in the algorithm. To better design the algorithm, compress the model, and deploy the algorithm, we also propose a framework based on F-LSTM. The framework also integrates Pytorch to increase usability. Experimental results on sentiment analysis tasks show that deploying algorithms to the F-LSTM hardware platform can achieve a 1.8× performance improvement and a 5.4× energy efficiency improvement compared to GPU. Experimental results also validate the need to build heterogeneous computing systems. In conclusion, our work reduces the difficulty of deploying LSTM on FPGAs while guaranteeing algorithm performance compared to traditional work. Full article
(This article belongs to the Special Issue FPGA-Based Accelerators of Deep Learning and Neuromorphic Computing)
Show Figures

Figure 1

18 pages, 6125 KiB  
Article
Hardware Acceleration and Implementation of YOLOX-s for On-Orbit FPGA
by Ling Wang, Hai Zhou, Chunjiang Bian, Kangning Jiang and Xiaolei Cheng
Electronics 2022, 11(21), 3473; https://doi.org/10.3390/electronics11213473 - 26 Oct 2022
Cited by 2 | Viewed by 1752
Abstract
The rapid development of remote sensing technology has brought about a sharp increase in the amount of remote sensing image data. However, due to the satellite’s limited hardware resources, space, and power consumption constraints, it is difficult to process massive remote sensing images [...] Read more.
The rapid development of remote sensing technology has brought about a sharp increase in the amount of remote sensing image data. However, due to the satellite’s limited hardware resources, space, and power consumption constraints, it is difficult to process massive remote sensing images efficiently and robustly using the traditional remote sensing image processing methods. Additionally, the task of satellite-to-ground target detection has higher requirements for speed and accuracy under the conditions of more and more remote sensing data. To solve these problems, this paper proposes an extremely efficient and reliable acceleration architecture for forward inference of the YOLOX-s detection network an on-orbit FPGA. Considering the limited onboard resources, the design strategy of the parallel loop unrolling of the input channels and output channels is adopted to build the largest DSP computing array to ensure a reliable and full utilization of the limited computing resources, thus reducing the inference delay of the entire network. Meanwhile, a three-path cache queue and a small-scale cascaded pooling array are designed, which maximize the reuse of on-chip cache data, effectively reduce the bandwidth bottleneck of the external memory, and ensure an efficient computing of the entire computing array. The experimental results show that at the 200 MHz operating frequency of the VC709, the overall inference performance of the FPGA acceleration can reach 399.62 GOPS, the peak performance can reach 408.4 GOPS, and the overall computing efficiency of the DSP array can reach 97.56%. Compared with the previous work, our architecture design further improves the computing efficiency under limited hardware resources. Full article
(This article belongs to the Special Issue FPGA-Based Accelerators of Deep Learning and Neuromorphic Computing)
Show Figures

Figure 1

Back to TopTop