Research

14 pages, 470 KiB

Open AccessArticle

Improving Performance Estimation for Design Space Exploration for Convolutional Neural Network Accelerators

by Martin Ferianc, Hongxiang Fan, Divyansh Manocha, Hongyu Zhou, Shuanglong Liu, Xinyu Niu and Wayne Luk

Electronics 2021, 10(4), 520; https://doi.org/10.3390/electronics10040520 - 23 Feb 2021

Cited by 7 | Viewed by 2463

Abstract

Contemporary advances in neural networks (NNs) have demonstrated their potential in different applications such as in image classification, object detection or natural language processing. In particular, reconfigurable accelerators have been widely used for the acceleration of NNs due to their reconfigurability and efficiency [...] Read more.

Contemporary advances in neural networks (NNs) have demonstrated their potential in different applications such as in image classification, object detection or natural language processing. In particular, reconfigurable accelerators have been widely used for the acceleration of NNs due to their reconfigurability and efficiency in specific application instances. To determine the configuration of the accelerator, it is necessary to conduct design space exploration to optimize the performance. However, the process of design space exploration is time consuming because of the slow performance evaluation for different configurations. Therefore, there is a demand for an accurate and fast performance prediction method to speed up design space exploration. This work introduces a novel method for fast and accurate estimation of different metrics that are of importance when performing design space exploration. The method is based on a Gaussian process regression model parametrised by the features of the accelerator and the target NN to be accelerated. We evaluate the proposed method together with other popular machine learning based methods in estimating the latency and energy consumption of our implemented accelerator on two different hardware platforms targeting convolutional neural networks. We demonstrate improvements in estimation accuracy, without the need for significant implementation effort or tuning. Full article

(This article belongs to the Special Issue Recent Advances in Embedded Computing, Intelligence and Applications)

► Show Figures

Figure 1

12 pages, 5120 KiB

Open AccessArticle

The Design of a 2D Graphics Accelerator for Embedded Systems

by Hyun Woo Oh, Ji Kwang Kim, Gwan Beom Hwang and Seung Eun Lee

Electronics 2021, 10(4), 469; https://doi.org/10.3390/electronics10040469 - 15 Feb 2021

Cited by 4 | Viewed by 4092

Abstract

Recently, advances in technology have enabled embedded systems to be adopted for a variety of applications. Some of these applications require real-time 2D graphics processing running on limited design specifications such as low power consumption and a small area. In order to satisfy [...] Read more.

Recently, advances in technology have enabled embedded systems to be adopted for a variety of applications. Some of these applications require real-time 2D graphics processing running on limited design specifications such as low power consumption and a small area. In order to satisfy such conditions, including a specific 2D graphics accelerator in the embedded system is an effective method. This method reduces the workload of the processor in the embedded system by exploiting the accelerator. The accelerator assists the system to perform 2D graphics processing in real-time. Therefore, a variety of applications that require 2D graphics processing can be implemented with an embedded processor. In this paper, we present a 2D graphics accelerator for tiny embedded systems. The accelerator includes an optimized line-drawing operation based on Bresenham’s algorithm. The optimized operation enables the accelerator to deal with various kinds of 2D graphics processing and to perform the line-drawing instead of the system processor. Moreover, the accelerator also distributes the workload of the processor core by removing the need for the core to access the frame buffer memory. We measure the performance of the accelerator by implementing the processor, including the accelerator, on a field-programmable gate array (FPGA), and ascertaining the possibility of realization by synthesizing using the 180 nm CMOS process. Full article

(This article belongs to the Special Issue Recent Advances in Embedded Computing, Intelligence and Applications)

► Show Figures

Figure 1

21 pages, 1351 KiB

Open AccessArticle

Optimising Hardware Accelerated Neural Networks with Quantisation and a Knowledge Distillation Evolutionary Algorithm

by Robert Stewart, Andrew Nowlan, Pascal Bacchus, Quentin Ducasse and Ekaterina Komendantskaya

Electronics 2021, 10(4), 396; https://doi.org/10.3390/electronics10040396 - 05 Feb 2021

Cited by 13 | Viewed by 3145

Abstract

This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel’s Movidius Myriad X VPU processor, and quantisation on Xilinx’s programmable Z7020 FPGA [...] Read more.

This paper compares the latency, accuracy, training time and hardware costs of neural networks compressed with our new multi-objective evolutionary algorithm called NEMOKD, and with quantisation. We evaluate NEMOKD on Intel’s Movidius Myriad X VPU processor, and quantisation on Xilinx’s programmable Z7020 FPGA hardware. Evolving models with NEMOKD increases inference accuracy by up to 82% at the cost of 38% increased latency, with throughput performance of 100–590 image frames-per-second (FPS). Quantisation identifies a sweet spot of 3 bit precision in the trade-off between latency, hardware requirements, training time and accuracy. Parallelising FPGA implementations of 2 and 3 bit quantised neural networks increases throughput from 6 k FPS to 373 k FPS, a 62

\times

speedup. Full article

(This article belongs to the Special Issue Recent Advances in Embedded Computing, Intelligence and Applications)

► Show Figures

Figure 1

22 pages, 16946 KiB

Open AccessArticle

A Flexible Fog Computing Design for Low-Power Consumption and Low Latency Applications

by Markos Losada, Ainhoa Cortés, Andoni Irizar, Javier Cejudo and Alejandro Pérez

Electronics 2021, 10(1), 57; https://doi.org/10.3390/electronics10010057 - 31 Dec 2020

Cited by 10 | Viewed by 2959

Abstract

In this paper, we propose a flexible Fog Computing architecture in which the main features are that it allows us to select among two different communication links (WiFi and LoRa) on the fly and offers a low-power solution, thanks to the applied power [...] Read more.

In this paper, we propose a flexible Fog Computing architecture in which the main features are that it allows us to select among two different communication links (WiFi and LoRa) on the fly and offers a low-power solution, thanks to the applied power management strategies at hardware and firmware level. The proposed Fog Computing architecture is formed by sensor nodes and an Internet of Things (IoT) gateway. In the case of LoRa, we have the choice of implementing the LoRaWAN and Application servers on the cloud or on the IoT gateway, avoiding, in this case, to send data to the Cloud. Additionally, we have presented an specific setup and methodology with the aim of measuring the sensor node’s power consumption and making sure there is a fair comparison between the different alternatives among the two selected wireless communication links by varying the duty cycle, the size of the payload, and the Spreading Factor (SF). This research work is in the scope of the STARPORTS Interconnecta Project, where we have deployed two sensor nodes in the offshore platform of PLOCAN, which communicate with the IoT gateway located in the PLOCAN premises. In this case, we have used LoRa communications due to the required large distance between the IoT gateway and the nodes in the offshore platform (in the range of kilometers). This deployment demonstrates that the proposed solution operates in a real environment and that it is a low-power and robust approach since it is sending data to the IoT gateway during more than one year and it continues working. Full article

(This article belongs to the Special Issue Recent Advances in Embedded Computing, Intelligence and Applications)

► Show Figures

Figure 1

10 pages, 2193 KiB

Open AccessArticle

Energy Efficiency of Machine Learning in Embedded Systems Using Neuromorphic Hardware

by Minseon Kang, Yongseok Lee and Moonju Park

Electronics 2020, 9(7), 1069; https://doi.org/10.3390/electronics9071069 - 30 Jun 2020

Cited by 15 | Viewed by 4199

Abstract

Recently, the application of machine learning on embedded systems has drawn interest in both the research community and industry because embedded systems located at the edge can produce a faster response and reduce network load. However, software implementation of neural networks on Central [...] Read more.

Recently, the application of machine learning on embedded systems has drawn interest in both the research community and industry because embedded systems located at the edge can produce a faster response and reduce network load. However, software implementation of neural networks on Central Processing Units (CPUs) is considered infeasible in embedded systems due to limited power supply. To accelerate AI processing, the many-core Graphics Processing Unit (GPU) has been a preferred device to the CPU. However, its energy efficiency is not still considered to be good enough for embedded systems. Among other approaches for machine learning on embedded systems, neuromorphic processing chips are expected to be less power-consuming and overcome the memory bottleneck. In this work, we implemented a pedestrian image detection system on an embedded device using a commercially available neuromorphic chip, NM500, which is based on NeuroMem technology. The NM500 processing time and the power consumption were measured as the number of chips was increased from one to seven, and they were compared to those of a multicore CPU system and a GPU-accelerated embedded system. The results show that NM500 is more efficient in terms of energy required to process data for both learning and classification than the GPU-accelerated system or the multicore CPU system. Additionally, limits and possible improvement of the current NM500 are identified based on the experimental results. Full article

(This article belongs to the Special Issue Recent Advances in Embedded Computing, Intelligence and Applications)

► Show Figures

Figure 1

36 pages, 5762 KiB

Open AccessFeature PaperArticle

A Dynamically Reconfigurable BbNN Architecture for Scalable Neuroevolution in Hardware

by Alberto García, Rafael Zamacola, Andrés Otero and Eduardo de la Torre

Electronics 2020, 9(5), 803; https://doi.org/10.3390/electronics9050803 - 13 May 2020

Cited by 3 | Viewed by 7408

Abstract

In this paper, a novel hardware architecture for neuroevolution is presented, aiming to enable the continuous adaptation of systems working in dynamic environments, by including the training stage intrinsically in the computing edge. It is based on the block-based neural network model, integrated [...] Read more.

In this paper, a novel hardware architecture for neuroevolution is presented, aiming to enable the continuous adaptation of systems working in dynamic environments, by including the training stage intrinsically in the computing edge. It is based on the block-based neural network model, integrated with an evolutionary algorithm that optimizes the weights and the topology of the network simultaneously. Differently to the state-of-the-art, the proposed implementation makes use of advanced dynamic and partial reconfiguration features to reconfigure the network during evolution and, if required, to adapt its size dynamically. This way, the number of logic resources occupied by the network can be adapted by the evolutionary algorithm to the complexity of the problem, the expected quality of the results, or other performance indicators. The proposed architecture, implemented in a Xilinx Zynq-7020 System-on-a-Chip (SoC) FPGA device, reduces the usage of DSPs and BRAMS while introducing a novel synchronization scheme that controls the latency of the circuit. The proposed neuroevolvable architecture has been integrated with the OpenAI toolkit to show how it can efficiently be applied to control problems, with a variable complexity and dynamic behavior. The versatility of the solution is assessed by also targeting classification problems. Full article

(This article belongs to the Special Issue Recent Advances in Embedded Computing, Intelligence and Applications)

► Show Figures

Figure 1

15 pages, 1167 KiB

Open AccessArticle

Performance of Two Approaches of Embedded Recommender Systems

by Francisco Pajuelo-Holguera, Juan A. Gómez-Pulido and Fernando Ortega

Electronics 2020, 9(4), 546; https://doi.org/10.3390/electronics9040546 - 25 Mar 2020

Cited by 3 | Viewed by 2099

Abstract

Nowadays, highly portable and low-energy computing environments require programming applications able to satisfy computing time and energy constraints. Furthermore, collaborative filtering based recommender systems are intelligent systems that use large databases and perform extensive matrix arithmetic calculations. In this research, we present an [...] Read more.

Nowadays, highly portable and low-energy computing environments require programming applications able to satisfy computing time and energy constraints. Furthermore, collaborative filtering based recommender systems are intelligent systems that use large databases and perform extensive matrix arithmetic calculations. In this research, we present an optimized algorithm and a parallel hardware implementation as good approach for running embedded collaborative filtering applications. To this end, we have considered high-level synthesis programming for reconfigurable hardware technology. The design was tested under environments where usual parameters and real-world datasets were applied, and compared to usual microprocessors running similar implementations. The performance results obtained by the different implementations were analyzed in computing time and energy consumption terms. The main conclusion is that the optimized algorithm is competitive in embedded applications when considering large datasets and parallel implementations based on reconfigurable hardware. Full article

(This article belongs to the Special Issue Recent Advances in Embedded Computing, Intelligence and Applications)

► Show Figures

Figure 1

26 pages, 3201 KiB

Open AccessFeature PaperEditor’s ChoiceArticle

A Modular IoT Hardware Platform for Distributed and Secured Extreme Edge Computing

by Pablo Merino, Gabriel Mujica, Jaime Señor and Jorge Portilla

Electronics 2020, 9(3), 538; https://doi.org/10.3390/electronics9030538 - 24 Mar 2020

Cited by 8 | Viewed by 4289

Abstract

The hardware of networked embedded sensor nodes is in continuous evolution, from those 8-bit MCUs-based platforms such as Mica, up to powerful Edge nodes that even include custom hardware devices, such as FPGAs in the Cookies platform. This evolution process comes up with [...] Read more.

The hardware of networked embedded sensor nodes is in continuous evolution, from those 8-bit MCUs-based platforms such as Mica, up to powerful Edge nodes that even include custom hardware devices, such as FPGAs in the Cookies platform. This evolution process comes up with issues related to the deployment of the Internet of Things, particularly in terms of performance and communication bottlenecks. Moreover, the associated integration process from the Edge up to the Cloud layer opens new security concerns that are key to assure the end-to-end trustability and interoperability. This work tackles these questions by proposing a novel embedded Edge platform based on an EFR32 SoC from Silicon Labs with Contiki-NG OS that includes an ARM Cortex M4 MCU and an IEEE 802.15.4 transceiver, used for resource-constrained low-power communication capabilities. This IoT Edge node integrates security by hardware, adding support for confidentiality, integrity and availability, making this Edge node ultra-secure for most of the common attacks in wireless sensor networks. Part of this security relies on an energy-efficient hardware accelerator that handles identity authentication, session key creation and management. Furthermore, the modular hardware platform aims at providing reliability and robustness in low-power distributed sensing application contexts on what is called the Extreme Edge, and for that purpose a lightweight multi-hop routing strategy for supporting dynamic discovery and interaction among participant devices is fully presented. This embedded algorithm has served as the baseline end-to-end communication capability to validate the IoT hardware platform through intensive experimental tests in a real deployment scenario. Full article

(This article belongs to the Special Issue Recent Advances in Embedded Computing, Intelligence and Applications)

► Show Figures

Figure 1

17 pages, 2051 KiB

Open AccessFeature PaperArticle

High-Level Synthesis of Multiclass SVM Using Code Refactoring to Classify Brain Cancer from Hyperspectral Images

by Abelardo Baez, Himar Fabelo, Samuel Ortega, Giordana Florimbi, Emanuele Torti, Abian Hernandez, Francesco Leporati, Giovanni Danese, Gustavo M. Callico and Roberto Sarmiento

Electronics 2019, 8(12), 1494; https://doi.org/10.3390/electronics8121494 - 06 Dec 2019

Cited by 8 | Viewed by 3701

Abstract

Currently, high-level synthesis (HLS) methods and tools are a highly relevant area in the strategy of several leading companies in the field of system-on-chips (SoCs) and field programmable gate arrays (FPGAs). HLS facilitates the work of system developers, who benefit from integrated and [...] Read more.

Currently, high-level synthesis (HLS) methods and tools are a highly relevant area in the strategy of several leading companies in the field of system-on-chips (SoCs) and field programmable gate arrays (FPGAs). HLS facilitates the work of system developers, who benefit from integrated and automated design workflows, considerably reducing the design time. Although many advances have been made in this research field, there are still some uncertainties about the quality and performance of the designs generated with the use of HLS methodologies. In this paper, we propose an optimization of the HLS methodology by code refactoring using Xilinx SDSoC^TM (Software-Defined System-On-Chip). Several options were analyzed for each alternative through code refactoring of a multiclass support vector machine (SVM) classifier written in C, using two different Zynq^®-7000 SoC devices from Xilinx, the ZC7020 (ZedBoard) and the ZC7045 (ZC706). The classifier was evaluated using a brain cancer database of hyperspectral images. The proposed methodology not only reduces the required resources using less than 20% of the FPGA, but also reduces the power consumption −23% compared to the full implementation. The speedup obtained of 2.86× (ZC7045) is the highest found in the literature for SVM hardware implementations. Full article

(This article belongs to the Special Issue Recent Advances in Embedded Computing, Intelligence and Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Recent Advances in Embedded Computing, Intelligence and Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (9 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI