Beyond Moore’s Law: Hardware Specialization and Advanced System on Chip

A special issue of Micromachines (ISSN 2072-666X). This special issue belongs to the section "E:Engineering and Technology".

Deadline for manuscript submissions: closed (16 June 2023) | Viewed by 11223

Special Issue Editor


E-Mail Website
Guest Editor
Computer Engineering, University of Houston Clear Lake, Houston, TX 77058, USA
Interests: ASIC/FPGA design and verificaion; SoC architecture; hardware acceleration on neural networks; high-level synthesis; hardware construction language

Special Issue Information

Dear Colleagues,

As the continuous scaling and miniaturization of transistors, the Moore's Law has been guiding the semiconductor industry over 50 years, making modern ICs faster and smaller, less power dissipation, and cheaper to manufacture. Though the Moore's Law is coming to an end due to the physical limitation, there is no doubt about the continuation of performance improvement for computing in a variety of other areas. Typical examples include purpose-built architectures such as Google tensor processing unit, and application-specific designs such as Nervana’s AI architecture, Facebook’s Big Sur, and Microsoft’s FPGA (field-programmable gate array) Configurable Cloud. Accordingly, this Special Issue seeks to showcase research papers and review articles that focus on advanced SoC architectures and specialized designs for various applications including but not limited to FPGA acceleration on neural networks, software-hardware co-design with FPGA, ASIC implementation with high-level synthesis, high abstract level design with HCL (hardware construction language), novel verification methods/methodologies, energy-efficient SoCs, custom IP designs, etc.

Dr. Xiaokun Yang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Micromachines is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • ASIC/FPGA design
  • SoC architecture
  • hardware acceleration
  • RTL Design with HDL (Hardware Design Language), HCL (Hardware Construction Language), HLS (High-level Synthesis)
  • neural networks
  • verification methodology

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

2 pages, 156 KiB  
Editorial
Editorial for the Beyond Moore’s Law: Hardware Specialization and Advanced System on Chip
by Xiaokun Yang
Micromachines 2023, 14(8), 1583; https://doi.org/10.3390/mi14081583 - 11 Aug 2023
Viewed by 571
Abstract
In the absence of a new transistor technology to replace CMOS, design specialization has emerged as one of the most immediate options for achieving high-performance computing [...] Full article

Research

Jump to: Editorial

18 pages, 944 KiB  
Article
Parameterizable Design on Convolutional Neural Networks Using Chisel Hardware Construction Language
by Mukesh Chowdary Madineni, Mario Vega and Xiaokun Yang
Micromachines 2023, 14(3), 531; https://doi.org/10.3390/mi14030531 - 24 Feb 2023
Cited by 2 | Viewed by 1353
Abstract
This paper presents a parameterizable design generator on convolutional neural networks (CNNs) using the Chisel hardware construction language (HCL). By parameterizing structural designs such as the streaming width, pooling layer type, and floating point precision, multiple register–transfer level (RTL) implementations can be created [...] Read more.
This paper presents a parameterizable design generator on convolutional neural networks (CNNs) using the Chisel hardware construction language (HCL). By parameterizing structural designs such as the streaming width, pooling layer type, and floating point precision, multiple register–transfer level (RTL) implementations can be created to meet various accuracy and hardware cost requirements. The evaluation is based on generated RTL designs including 16-bit, 32-bit, 64-bit, and 128-bit implementations on field-programmable gate arrays (FPGAs). The experimental results show that the 32-bit design achieves optimal hardware performance when setting the same weights for estimating the quality of the results, FPGA slice count, and power dissipation. Although the focus is on CNNs, the approach can be extended to other neural network models for efficient RTL design. Full article
Show Figures

Figure 1

16 pages, 660 KiB  
Article
Efficient Layer-Wise N:M Sparse CNN Accelerator with Flexible SPEC: Sparse Processing Element Clusters
by Xiaoru Xie, Mingyu Zhu, Siyuan Lu and Zhongfeng Wang
Micromachines 2023, 14(3), 528; https://doi.org/10.3390/mi14030528 - 24 Feb 2023
Cited by 2 | Viewed by 1572
Abstract
Recently, the layer-wise N:M fine-grained sparse neural network algorithm (i.e., every M-weights contains N non-zero values) has attracted tremendous attention, as it can effectively reduce the computational complexity with negligible accuracy loss. However, the speed-up potential of this algorithm will [...] Read more.
Recently, the layer-wise N:M fine-grained sparse neural network algorithm (i.e., every M-weights contains N non-zero values) has attracted tremendous attention, as it can effectively reduce the computational complexity with negligible accuracy loss. However, the speed-up potential of this algorithm will not be fully exploited if the right hardware support is lacking. In this work, we design an efficient accelerator for the N:M sparse convolutional neural networks (CNNs) with layer-wise sparse patterns. First, we analyze the performances of different processing element (PE) structures and extensions to construct the flexible PE architecture. Second, the variable sparse convolutional dimensions and sparse ratios are involved in the hardware design. With a sparse PE cluster (SPEC) design, the hardware can efficiently accelerate CNNs with the layer-wise N:M pattern. Finally, we employ the proposed SPEC into the CNN accelerator with flexible network-on-chip and specially designed dataflow. We implement hardware accelerators on Xilinx ZCU102 FPGA and Xilinx VCU118 FPGA and evaluate them with classical CNNs such as Alexnet, VGG-16, and ResNet-50. Compared with existing accelerators designed for structured and unstructured pruned networks, our design achieves the best performance in terms of power efficiency. Full article
Show Figures

Figure 1

12 pages, 359 KiB  
Article
Highly Concurrent TCP Session Connection Management System on FPGA Chip
by Ke Wang, Yunfei Guo and Zhichuan Guo
Micromachines 2023, 14(2), 385; https://doi.org/10.3390/mi14020385 - 03 Feb 2023
Cited by 1 | Viewed by 1537
Abstract
Transmission Control Protocol (TCP) is a connection-oriented data transmission protocol, and it is also the main communication protocol used for end-to-end data transmission in the current Internet. At present, the mainstream TCP protocol processing service is implemented by software running on the Central [...] Read more.
Transmission Control Protocol (TCP) is a connection-oriented data transmission protocol, and it is also the main communication protocol used for end-to-end data transmission in the current Internet. At present, the mainstream TCP protocol processing service is implemented by software running on the Central Processing Unit (CPU). However, with the rapid growth of transmission bandwidth and the number of connections, the software-based processing method is not ideal in terms of delay and throughput, and also affects the processing performance of the CPU in other applications such as virtualization services. Moreover, other hardware solutions can only support a limited number of TCP session connections. In order to improve the processing efficiency of the TCP protocol and achieve highly concurrent network services, this paper proposes a TCP offload engine (TOE) prototype system based on field programmable gate array (FPGA) chips. It not only provides hardware-based data path processing, but also realizes hardware management of large-scale TCP session connection status through a multi-level cache management mechanism. Studies have shown that this solution can support 100 Gbps high-performance throughput characteristics, and allow concurrent processing of hundreds to 250,000 TCP connection state hardware maintenance on a single network node, improving the overall performance of the network system. Full article
Show Figures

Figure 1

22 pages, 2031 KiB  
Article
Real-Time RISC-V-Based CAN-FD Bus Diagnosis Tool
by Cosmin-Andrei Popovici and Andrei Stan
Micromachines 2023, 14(1), 196; https://doi.org/10.3390/mi14010196 - 12 Jan 2023
Cited by 2 | Viewed by 1737
Abstract
Network Diagnosis Tools with industrial-grade quality are not widely available for common users such as researchers and students. This kind of tool enables users to develop Distributed Embedded Systems using low-cost and reliable setups. In the context of RISC-V Extensions and Domain-Specific Architecture, [...] Read more.
Network Diagnosis Tools with industrial-grade quality are not widely available for common users such as researchers and students. This kind of tool enables users to develop Distributed Embedded Systems using low-cost and reliable setups. In the context of RISC-V Extensions and Domain-Specific Architecture, this paper proposes a Real-Time RISC-V-based CAN-FD Bus Diagnosis Tool, named RiscDiag CanFd, as an open-source alternative. The RISC-V Core extension is a CAN-FD Communication Unit controlled by a dedicated ISA Extension. Besides the extended RISC-V core, the proposed SoC provides UDP Communication via Ethernet for connecting the proposed solution to a PC. Additionally, a GUI application was developed for accessing and using the hardware solution deployed in an FPGA. The proposed solution is evaluated by measuring the lost frame rate, the precision of captured frames timestamps and the latency of preparing data for Ethernet communication. Measurements revealed a 0% frame loss rate, a timestamp error under 0.001% and an acquisition cycle jitter under 10 ns. Full article
Show Figures

Figure 1

19 pages, 3856 KiB  
Article
YOLOv4-Tiny-Based Coal Gangue Image Recognition and FPGA Implementation
by Shanyong Xu, Yujie Zhou, Yourui Huang and Tao Han
Micromachines 2022, 13(11), 1983; https://doi.org/10.3390/mi13111983 - 16 Nov 2022
Cited by 8 | Viewed by 2085
Abstract
Nowadays, most of the deep learning coal gangue identification methods need to be performed on high-performance CPU or GPU hardware devices, which are inconvenient to use in complex underground coal mine environments due to their high power consumption, huge size, and significant heat [...] Read more.
Nowadays, most of the deep learning coal gangue identification methods need to be performed on high-performance CPU or GPU hardware devices, which are inconvenient to use in complex underground coal mine environments due to their high power consumption, huge size, and significant heat generation. Aiming to resolve these problems, this paper proposes a coal gangue identification method based on YOLOv4-tiny and deploys it on the low-power hardware platform FPGA. First, the YOLOv4-tiny model is well trained on the computer platform, and the computation of the model is reduced through the 16-bit fixed-point quantization and the integration of a BN layer and convolution layer. Second, convolution and pooling IP kernels are designed on the FPGA platform to accelerate the computation of convolution and pooling, in which three optimization methods, including input and output channel parallelism, pipeline, and ping-pong operation, are used. Finally, the FPGA hardware system design of the whole algorithm is completed. The experimental results of the self-made coal gangue data set indicate that the precision of the algorithm proposed in this paper for coal gangue recognition on the FPGA platform are slightly lower than those of CPU and GPU, and the mAP value is 96.56%; the recognition speed of each image is 0.376 s, which is between those of CPU and GPU; the hardware power consumption of the FPGA platform is only 2.86 W; and the energy efficiency ratio is 10.42 and 3.47 times that of CPU and GPU, respectively. Full article
Show Figures

Figure 1

17 pages, 655 KiB  
Article
A High-Performance and Flexible Architecture for Accelerating SDN on the MPSoC Platform
by Meng Sha, Zhichuan Guo, Yunfei Guo and Xuewen Zeng
Micromachines 2022, 13(11), 1854; https://doi.org/10.3390/mi13111854 - 29 Oct 2022
Cited by 3 | Viewed by 1205
Abstract
Software-defined networking has been developing in recent years and the separation of the control plane and the data plane has made networks more flexible. Due to its flexibility, the software method is used to implement the data plane. However, with increasing network speed, [...] Read more.
Software-defined networking has been developing in recent years and the separation of the control plane and the data plane has made networks more flexible. Due to its flexibility, the software method is used to implement the data plane. However, with increasing network speed, the CPU is becoming unable to meet the requirements of high-speed packet processing. FPGAs are usually used as dumb switches to accelerate the data plane, with all intelligence centralized in the remote controller. However, the cost of taking the intelligence out of the switch is the increased latency between the controller and the switch. Therefore, we argue that the control decisions should be made as locally as possible. In this paper, we propose a novel architecture with high performance and flexibility for accelerating SDN based on the MPSoC platform. The control plane is implemented in the on-chip CPU and the data plane is implemented in the FPGA logic. The communication between the two components is performed using Ethernet communication. We design a high-performance TCAM based on distributed RAM. The architecture employs a pipeline design with modules connected via the AXI Stream interface. The designed architecture is flexible enough to support multiple network functions while achieving high performance at 100 Gbps. As far as we know, the architecture is the first proposed in the design of a 100 Gbps system. Full article
Show Figures

Figure 1

Back to TopTop