Hardware for Machine Learning

A special issue of Journal of Low Power Electronics and Applications (ISSN 2079-9268).

Deadline for manuscript submissions: closed (1 March 2022) | Viewed by 21109

Special Issue Editors

Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA
Interests: ultra-low power circuits and systems; analog computing; precision circuits; hardware security
Special Issues, Collections and Topics in MDPI journals
Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, USA
Interests: mixed-signal IC design; cmos photonic ICs; RF/mmwave photonics; neuromorphic circuits
Special Issues, Collections and Topics in MDPI journals
UM-SJTU Joint Institute, Shanghai Jiao Tong University, Shanghai 200240, China
Interests: EDA; computer architecture; low power VLSI; hardware acceleration

Special Issue Information

Dear Colleagues,

This Special Issue focusses on hardware and circuit design methods for machine learning applications. It will include invited papers that will cover a range of topics—the large-scale integration of CMOS mixed-signal integrated circuits and nanoscale emerging devices, to enable a new generation of integrated circuits and systems that can be applied to a wide range of machine learning problems; on-device learning; in-memory computing; neuromorphic deep learning, and system-level aspects of Edge-AI.

The rationale of this Special Issue is to develop a compelling volume of research in the emerging field of neuromorphic and machine learning (ML) circuits and systems, and present advances in their individual studies in this area of growing importance. We believe that this topic is timely and compelling, as there is a growing need for training ML and artificial intelligence (AI) algorithms on low-power platforms that can potentially provide an orders-of-magnitude improvement in energy-efficiency, when compared to the present focus on graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and digital application-specific integrated circuits (ASICs). Low-power mixed-signal circuits that leverage conventional and emerging non-volatile emerging devices, such as the resistive RAM (RRAM) and phase-change RAM (PCRAM), are potential candidates for achieving this energy-efficiency with very high synaptic density. Further, these systems need to be completely rethought, as such non-von Neumann architectures will require entirely new ways of programming and managing resources. There are several open challenges in this area at the device-, circuit-, algorithm- and system-levels, and the presented papers in the proposed session will address some of these in a timely manner.

Dr. Aatmesh Shrivastava
Dr. Vishal Saxena
Dr. Xinfei Guo

Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Low Power Electronics and Applications is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 6762 KiB  
Article
Implementing a Timing Error-Resilient and Energy-Efficient Near-Threshold Hardware Accelerator for Deep Neural Network Inference
J. Low Power Electron. Appl. 2022, 12(2), 32; https://doi.org/10.3390/jlpea12020032 - 06 Jun 2022
Cited by 2 | Viewed by 2407
Abstract
Increasing processing requirements in the Artificial Intelligence (AI) realm has led to the emergence of domain-specific architectures for Deep Neural Network (DNN) applications. Tensor Processing Unit (TPU), a DNN accelerator by Google, has emerged as a front runner outclassing its contemporaries, CPUs and [...] Read more.
Increasing processing requirements in the Artificial Intelligence (AI) realm has led to the emergence of domain-specific architectures for Deep Neural Network (DNN) applications. Tensor Processing Unit (TPU), a DNN accelerator by Google, has emerged as a front runner outclassing its contemporaries, CPUs and GPUs, in performance by 15×–30×. TPUs have been deployed in Google data centers to cater to the performance demands. However, a TPU’s performance enhancement is accompanied by a mammoth power consumption. In the pursuit of lowering the energy utilization, this paper proposes PREDITOR—a low-power TPU operating in the Near-Threshold Computing (NTC) realm. PREDITOR uses mathematical analysis to mitigate the undetectable timing errors by boosting the voltage of the selective multiplier-and-accumulator units at specific intervals to enhance the performance of the NTC TPU, thereby ensuring a high inference accuracy at low voltage. PREDITOR offers up to 3×–5× improved performance in comparison to the leading-edge error mitigation schemes with a minor loss in accuracy. Full article
(This article belongs to the Special Issue Hardware for Machine Learning)
Show Figures

Figure 1

35 pages, 2845 KiB  
Article
Embedded Object Detection with Custom LittleNet, FINN and Vitis AI DCNN Accelerators
J. Low Power Electron. Appl. 2022, 12(2), 30; https://doi.org/10.3390/jlpea12020030 - 20 May 2022
Cited by 3 | Viewed by 3704
Abstract
Object detection is an essential component of many systems used, for example, in advanced driver assistance systems (ADAS) or advanced video surveillance systems (AVSS). Currently, the highest detection accuracy is achieved by solutions using deep convolutional neural networks (DCNN). Unfortunately, these come at [...] Read more.
Object detection is an essential component of many systems used, for example, in advanced driver assistance systems (ADAS) or advanced video surveillance systems (AVSS). Currently, the highest detection accuracy is achieved by solutions using deep convolutional neural networks (DCNN). Unfortunately, these come at the cost of a high computational complexity; hence, the work on the widely understood acceleration of these algorithms is very important and timely. In this work, we compare three different DCNN hardware accelerator implementation methods: coarse-grained (a custom accelerator called LittleNet), fine-grained (FINN) and sequential (Vitis AI). We evaluate the approaches in terms of object detection accuracy, throughput and energy usage on the VOT and VTB datasets. We also present the limitations of each of the methods considered. We describe the whole process of DNNs implementation, including architecture design, training, quantisation and hardware implementation. We used two custom DNN architectures to obtain a higher accuracy, higher throughput and lower energy consumption. The first was implemented in SystemVerilog and the second with the FINN tool from AMD Xilinx. Next, both approaches were compared with the Vitis AI tool from AMD Xilinx. The final implementations were tested on the Avnet Ultra96-V2 development board with the Zynq UltraScale+ MPSoC ZCU3EG device. For two different DNNs architectures, we achieved a throughput of 196 fps for our custom accelerator and 111 fps for FINN. The same networks implemented with Vitis AI achieved 123.3 fps and 53.3 fps, respectively. Full article
(This article belongs to the Special Issue Hardware for Machine Learning)
Show Figures

Figure 1

25 pages, 1436 KiB  
Article
Low-Overhead Reinforcement Learning-Based Power Management Using 2QoSM
J. Low Power Electron. Appl. 2022, 12(2), 29; https://doi.org/10.3390/jlpea12020029 - 19 May 2022
Cited by 1 | Viewed by 2498
Abstract
With the computational systems of even embedded devices becoming ever more powerful, there is a need for more effective and pro-active methods of dynamic power management. The work presented in this paper demonstrates the effectiveness of a reinforcement-learning based dynamic power manager placed [...] Read more.
With the computational systems of even embedded devices becoming ever more powerful, there is a need for more effective and pro-active methods of dynamic power management. The work presented in this paper demonstrates the effectiveness of a reinforcement-learning based dynamic power manager placed in a software framework. This combination of Q-learning for determining policy and the software abstractions provide many of the benefits of co-design, namely, good performance, responsiveness and application guidance, with the flexibility of easily changing policies or platforms. The Q-learning based Quality of Service Manager (2QoSM) is implemented on an autonomous robot built on a complex, powerful embedded single-board computer (SBC) and a high-resolution path-planning algorithm. We find that the 2QoSM reduces power consumption up to 42% compared to the Linux on-demand governor and 10.2% over a state-of-the-art situation aware governor. Moreover, the performance as measured by path error is improved by up to 6.1%, all while saving power. Full article
(This article belongs to the Special Issue Hardware for Machine Learning)
Show Figures

Figure 1

26 pages, 988 KiB  
Article
A Framework for Ultra Low-Power Hardware Accelerators Using NNs for Embedded Time Series Classification
J. Low Power Electron. Appl. 2022, 12(1), 2; https://doi.org/10.3390/jlpea12010002 - 31 Dec 2021
Viewed by 3335
Abstract
In embedded applications that use neural networks (NNs) for classification tasks, it is important to not only minimize the power consumption of the NN calculation, but of the whole system. Optimization approaches for individual parts exist, such as quantization of the NN or [...] Read more.
In embedded applications that use neural networks (NNs) for classification tasks, it is important to not only minimize the power consumption of the NN calculation, but of the whole system. Optimization approaches for individual parts exist, such as quantization of the NN or analog calculation of arithmetic operations. However, there is no holistic approach for a complete embedded system design that is generic enough in the design process to be used for different applications, but specific in the hardware implementation to waste no energy for a given application. Therefore, we present a novel framework that allows an end-to-end ASIC implementation of a low-power hardware for time series classification using NNs. This includes a neural architecture search (NAS), which optimizes the NN configuration for accuracy and energy efficiency at the same time. This optimization targets a custom designed hardware architecture that is derived from the key properties of time series classification tasks. Additionally, a hardware generation tool is used that creates a complete system from the definition of the NN. This system uses local multi-level RRAM memory as weight and bias storage to avoid external memory access. Exploiting the non-volatility of these devices, such a system can use a power-down mode to save significant energy during the data acquisition process. Detection of atrial fibrillation (AFib) in electrocardiogram (ECG) data is used as an example for evaluation of the framework. It is shown that a reduction of more than 95% of the energy consumption compared to state-of-the-art solutions is achieved. Full article
(This article belongs to the Special Issue Hardware for Machine Learning)
Show Figures

Figure 1

14 pages, 6098 KiB  
Article
Implementation of Multi-Exit Neural-Network Inferences for an Image-Based Sensing System with Energy Harvesting
J. Low Power Electron. Appl. 2021, 11(3), 34; https://doi.org/10.3390/jlpea11030034 - 04 Sep 2021
Cited by 5 | Viewed by 3225
Abstract
Wireless sensor systems powered by batteries are widely used in a variety of applications. For applications with space limitation, their size was reduced, limiting battery energy capacity and memory storage size. A multi-exit neural network enables to overcome these limitations by filtering out [...] Read more.
Wireless sensor systems powered by batteries are widely used in a variety of applications. For applications with space limitation, their size was reduced, limiting battery energy capacity and memory storage size. A multi-exit neural network enables to overcome these limitations by filtering out data without objects of interest, thereby avoiding computing the entire neural network. This paper proposes to implement a multi-exit convolutional neural network on the ESP32-CAM embedded platform as an image-sensing system with an energy constraint. The multi-exit design saves energy by 42.7% compared with the single-exit condition. A simulation result, based on an exemplary natural outdoor light profile and measured energy consumption of the proposed system, shows that the system can sustain its operation with a 3.2 kJ (275 mAh @ 3.2 V) battery by scarifying the accuracy only by 2.7%. Full article
(This article belongs to the Special Issue Hardware for Machine Learning)
Show Figures

Figure 1

25 pages, 1814 KiB  
Article
A Dynamic Reconfigurable Architecture for Hybrid Spiking and Convolutional FPGA-Based Neural Network Designs
J. Low Power Electron. Appl. 2021, 11(3), 32; https://doi.org/10.3390/jlpea11030032 - 17 Aug 2021
Cited by 16 | Viewed by 4584
Abstract
This work presents a dynamically reconfigurable architecture for Neural Network (NN) accelerators implemented in Field-Programmable Gate Array (FPGA) that can be applied in a variety of application scenarios. Although the concept of Dynamic Partial Reconfiguration (DPR) is increasingly used in NN accelerators, the [...] Read more.
This work presents a dynamically reconfigurable architecture for Neural Network (NN) accelerators implemented in Field-Programmable Gate Array (FPGA) that can be applied in a variety of application scenarios. Although the concept of Dynamic Partial Reconfiguration (DPR) is increasingly used in NN accelerators, the throughput is usually lower than pure static designs. This work presents a dynamically reconfigurable energy-efficient accelerator architecture that does not sacrifice throughput performance. The proposed accelerator comprises reconfigurable processing engines and dynamically utilizes the device resources according to model parameters. Using the proposed architecture with DPR, different NN types and architectures can be realized on the same FPGA. Moreover, the proposed architecture maximizes throughput performance with design optimizations while considering the available resources on the hardware platform. We evaluate our design with different NN architectures for two different tasks. The first task is the image classification of two distinct datasets, and this requires switching between Convolutional Neural Network (CNN) architectures having different layer structures. The second task requires switching between NN architectures, namely a CNN architecture with high accuracy and throughput and a hybrid architecture that combines convolutional layers and an optimized Spiking Neural Network (SNN) architecture. We demonstrate throughput results from quickly reprogramming only a tiny part of the FPGA hardware using DPR. Experimental results show that the implemented designs achieve a 7× faster frame rate than current FPGA accelerators while being extremely flexible and using comparable resources. Full article
(This article belongs to the Special Issue Hardware for Machine Learning)
Show Figures

Figure 1

Back to TopTop