Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (31 October 2019) | Viewed by 67243

Special Issue Editor


E-Mail Website
Guest Editor
Department of Electrical and Computer Engineering, Brigham Young University, 450 Engineering Building, Provo, UT 84602-4099, USA
Interests: artificial intelligence; high-performance visual computing; robotic vision; real-time visual inspection automation
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Processing speed is critical for many visual computing tasks. Many computer vision algorithms generate accurate results, but run too slowly to produce results in real time. On the other hand, some algorithms process at camera frame rates, but with reduced accuracy, a more useful combination for real-time applications.  Moreover, FPGAs are increasing in capacity and decreasing in power consumption, making them more attractive for embedded applications, such as onboard vision and control for unmanned vehicles. Convolutional neural networks (CNNs) offer state-of-the-art accuracy for many computer vision tasks.  Their capabilities are generalizable to many different real-world applications. Real-world applications often require real-time responsiveness from the vision system. This Special Issue focuses on CNNs and their application to real-time computer vision tasks.

General topics covered in this Special Issue include, but are not limited to:

  • FPGA-based hardware acceleration of vision algorithms;
  • GPU-based acceleration of vision algorithms;
  • Embedded vision sensors for applications that require real-time performance;
  • CNN architecture optimizations for real-time performance;
  • CNN acceleration through approximate computing;
  • GPU-based implementations for real-time CNN performance;
  • FPGA-based implementations for real-time CNN performance;
  • Real-time CNN performance on resource limited systems;
  • CNN applications that require real-time performance;
  • Tradeoff analysis between speed and accuracy in CNNs.

Prof. Dr. Dah-Jye Lee
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

28 pages, 12120 KiB  
Article
A Low-Power Spike-Like Neural Network Design
by Michael Losh and Daniel Llamocca
Electronics 2019, 8(12), 1479; https://doi.org/10.3390/electronics8121479 - 04 Dec 2019
Cited by 6 | Viewed by 4976
Abstract
Modern massively-parallel Graphics Processing Units (GPUs) and Machine Learning (ML) frameworks enable neural network implementations of unprecedented performance and sophistication. However, state-of-the-art GPU hardware platforms are extremely power-hungry, while microprocessors cannot achieve the performance requirements. Biologically-inspired Spiking Neural Networks (SNN) have inherent characteristics [...] Read more.
Modern massively-parallel Graphics Processing Units (GPUs) and Machine Learning (ML) frameworks enable neural network implementations of unprecedented performance and sophistication. However, state-of-the-art GPU hardware platforms are extremely power-hungry, while microprocessors cannot achieve the performance requirements. Biologically-inspired Spiking Neural Networks (SNN) have inherent characteristics that lead to lower power consumption. We thus present a bit-serial SNN-like hardware architecture. By using counters, comparators, and an indexing scheme, the design effectively implements the sum-of-products inherent in neurons. In addition, we experimented with various strength-reduction methods to lower neural network resource usage. The proposed Spiking Hybrid Network (SHiNe), validated on an FPGA, has been found to achieve reasonable performance with a low resource utilization, with some trade-off with respect to hardware throughput and signal representation. Full article
Show Figures

Figure 1

15 pages, 6106 KiB  
Article
A Modularized Architecture of Multi-Branch Convolutional Neural Network for Image Captioning
by Shan He and Yuanyao Lu
Electronics 2019, 8(12), 1417; https://doi.org/10.3390/electronics8121417 - 28 Nov 2019
Cited by 11 | Viewed by 2977
Abstract
Image captioning is a comprehensive task in computer vision (CV) and natural language processing (NLP). It can complete conversion from image to text, that is, the algorithm automatically generates corresponding descriptive text according to the input image. In this paper, we present an [...] Read more.
Image captioning is a comprehensive task in computer vision (CV) and natural language processing (NLP). It can complete conversion from image to text, that is, the algorithm automatically generates corresponding descriptive text according to the input image. In this paper, we present an end-to-end model that takes deep convolutional neural network (CNN) as the encoder and recurrent neural network (RNN) as the decoder. In order to get better image captioning extraction, we propose a highly modularized multi-branch CNN, which could increase accuracy while maintaining the number of hyper-parameters unchanged. This strategy provides a simply designed network consists of parallel sub-modules of the same structure. While traditional CNN goes deeper and wider to increase accuracy, our proposed method is more effective with a simple design, which is easier to optimize for practical application. Experiments are conducted on Flickr8k, Flickr30k and MSCOCO entities. Results demonstrate that our method achieves state of the art performances in terms of caption quality. Full article
Show Figures

Figure 1

19 pages, 8452 KiB  
Article
Real-Time Object Detection in Remote Sensing Images Based on Visual Perception and Memory Reasoning
by Xia Hua, Xinqing Wang, Ting Rui, Dong Wang and Faming Shao
Electronics 2019, 8(10), 1151; https://doi.org/10.3390/electronics8101151 - 11 Oct 2019
Cited by 9 | Viewed by 4076
Abstract
Aiming at the real-time detection of multiple objects and micro-objects in large-scene remote sensing images, a cascaded convolutional neural network real-time object-detection framework for remote sensing images is proposed, which integrates visual perception and convolutional memory network reasoning. The detection framework is composed [...] Read more.
Aiming at the real-time detection of multiple objects and micro-objects in large-scene remote sensing images, a cascaded convolutional neural network real-time object-detection framework for remote sensing images is proposed, which integrates visual perception and convolutional memory network reasoning. The detection framework is composed of two fully convolutional networks, namely, the strengthened object self-attention pre-screening fully convolutional network (SOSA-FCN) and the object accurate detection fully convolutional network (AD-FCN). SOSA-FCN introduces a self-attention module to extract attention feature maps and constructs a depth feature pyramid to optimize the attention feature maps by combining convolutional long-term and short-term memory networks. It guides the acquisition of potential sub-regions of the object in the scene, reduces the computational complexity, and enhances the network’s ability to extract multi-scale object features. It adapts to the complex background and small object characteristics of a large-scene remote sensing image. In AD-FCN, the object mask and object orientation estimation layer are designed to achieve fine positioning of candidate frames. The performance of the proposed algorithm is compared with that of other advanced methods on NWPU_VHR-10, DOTA, UCAS-AOD, and other open datasets. The experimental results show that the proposed algorithm significantly improves the efficiency of object detection while ensuring detection accuracy and has high adaptability. It has extensive engineering application prospects. Full article
Show Figures

Figure 1

12 pages, 3522 KiB  
Article
Detection of Wildfire Smoke Images Based on a Densely Dilated Convolutional Network
by Tingting Li, Enting Zhao, Junguo Zhang and Chunhe Hu
Electronics 2019, 8(10), 1131; https://doi.org/10.3390/electronics8101131 - 07 Oct 2019
Cited by 34 | Viewed by 4312
Abstract
Recently, many researchers have attempted to use convolutional neural networks (CNNs) for wildfire smoke detection. However, the application of CNNs in wildfire smoke detection still faces several issues, e.g., the high false-alarm rate of detection and the imbalance of training data. To address [...] Read more.
Recently, many researchers have attempted to use convolutional neural networks (CNNs) for wildfire smoke detection. However, the application of CNNs in wildfire smoke detection still faces several issues, e.g., the high false-alarm rate of detection and the imbalance of training data. To address these issues, we propose a novel framework integrating conventional methods into CNN for wildfire smoke detection, which consisted of a candidate smoke region segmentation strategy and an advanced network architecture, namely wildfire smoke dilated DenseNet (WSDD-Net). Candidate smoke region segmentation removed the complex backgrounds of the wildfire smoke images. The proposed WSDD-Net achieved multi-scale feature extraction by combining dilated convolutions with dense block. In order to solve the problem of the dataset imbalance, an improved cross entropy loss function, namely balanced cross entropy (BCE), was used instead of the original cross entropy loss function in the training process. The proposed WSDD-Net was evaluated according to two smoke datasets, i.e., WS and Yuan, and achieved a high AR (99.20%) and a low FAR (0.24%). The experimental results demonstrated that the proposed framework had better detection capabilities under different negative sample interferences. Full article
Show Figures

Figure 1

13 pages, 1194 KiB  
Article
Efficient Implementation of 2D and 3D Sparse Deconvolutional Neural Networks with a Uniform Architecture on FPGAs
by Deguang Wang, Junzhong Shen, Mei Wen and Chunyuan Zhang
Electronics 2019, 8(7), 803; https://doi.org/10.3390/electronics8070803 - 18 Jul 2019
Cited by 11 | Viewed by 3224
Abstract
Three-dimensional (3D) deconvolution is widely used in many computer vision applications. However, most previous works have only focused on accelerating two-dimensional (2D) deconvolutional neural networks (DCNNs) on Field-Programmable Gate Arrays (FPGAs), while the acceleration of 3D DCNNs has not been well studied in [...] Read more.
Three-dimensional (3D) deconvolution is widely used in many computer vision applications. However, most previous works have only focused on accelerating two-dimensional (2D) deconvolutional neural networks (DCNNs) on Field-Programmable Gate Arrays (FPGAs), while the acceleration of 3D DCNNs has not been well studied in depth as they have higher computational complexity and sparsity than 2D DCNNs. In this paper, we focus on the acceleration of both 2D and 3D sparse DCNNs on FPGAs by proposing efficient schemes for mapping 2D and 3D sparse DCNNs on a uniform architecture. Firstly, a pruning method is used to prune unimportant network connections and increase the sparsity of weights. After being pruned, the number of parameters of DCNNs is reduced significantly without accuracy loss. Secondly, the remaining non-zero weights are encoded in coordinate (COO) format, reducing the memory demands of parameters. Finally, to demonstrate the effectiveness of our work, we implement our accelerator design on the Xilinx VC709 evaluation platform for four real-life 2D and 3D DCNNs. After the first two steps, the storage required of DCNNs is reduced up to 3.9×. Results show that the performance of our method on the accelerator outperforms that of the our prior work by 2.5× to 3.6× in latency. Full article
Show Figures

Figure 1

20 pages, 3477 KiB  
Article
Jet Features: Hardware-Friendly, Learned Convolutional Kernels for High-Speed Image Classification
by Taylor Simons and Dah-Jye Lee
Electronics 2019, 8(5), 588; https://doi.org/10.3390/electronics8050588 - 27 May 2019
Cited by 3 | Viewed by 3398
Abstract
This paper explores a set of learned convolutional kernels which we call Jet Features. Jet Features are efficient to compute in software, easy to implement in hardware and perform well on visual inspection tasks. Because Jet Features can be learned, they can be [...] Read more.
This paper explores a set of learned convolutional kernels which we call Jet Features. Jet Features are efficient to compute in software, easy to implement in hardware and perform well on visual inspection tasks. Because Jet Features can be learned, they can be used in machine learning algorithms. Using Jet Features, we make significant improvements on our previous work, the Evolution Constructed Features (ECO Features) algorithm. Not only do we gain a 3.7× speedup in software without loosing any accuracy on the CIFAR-10 and MNIST datasets, but Jet Features also allow us to implement the algorithm in an FPGA using only a fraction of its resources. We hope to apply the benefits of Jet Features to Convolutional Neural Networks in the future. Full article
Show Figures

Figure 1

22 pages, 3964 KiB  
Article
An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks
by Qinyu Chen, Yuxiang Fu, Wenqing Song, Kaifeng Cheng, Zhonghai Lu, Chuan Zhang and Li Li
Electronics 2019, 8(4), 371; https://doi.org/10.3390/electronics8040371 - 27 Mar 2019
Cited by 4 | Viewed by 3764
Abstract
Convolutional Neural Networks (CNNs) have been widely applied in various fields, such as image recognition, speech processing, as well as in many big-data analysis tasks. However, their large size and intensive computation hinder their deployment in hardware, especially on the embedded systems with [...] Read more.
Convolutional Neural Networks (CNNs) have been widely applied in various fields, such as image recognition, speech processing, as well as in many big-data analysis tasks. However, their large size and intensive computation hinder their deployment in hardware, especially on the embedded systems with stringent latency, power, and area requirements. To address this issue, low bit-width CNNs are proposed as a highly competitive candidate. In this paper, we propose an efficient, scalable accelerator for low bit-width CNNs based on a parallel streaming architecture. With a novel coarse grain task partitioning (CGTP) strategy, the proposed accelerator with heterogeneous computing units, supporting multi-pattern dataflows, can nearly double the throughput for various CNN models on average. Besides, a hardware-friendly algorithm is proposed to simplify the activation and quantification process, which can reduce the power dissipation and area overhead. Based on the optimized algorithm, an efficient reconfigurable three-stage activation-quantification-pooling (AQP) unit with the low power staged blocking strategy is developed, which can process activation, quantification, and max-pooling operations simultaneously. Moreover, an interleaving memory scheduling scheme is proposed to well support the streaming architecture. The accelerator is implemented with TSMC 40 nm technology with a core size of 0.17 mm 2 . It can achieve 7.03 TOPS/W energy efficiency and 4.14 TOPS/mm 2 area efficiency at 100.1 mW, which makes it a promising design for the embedded devices. Full article
Show Figures

Figure 1

15 pages, 2318 KiB  
Article
Optimized Compression for Implementing Convolutional Neural Networks on FPGA
by Min Zhang, Linpeng Li, Hai Wang, Yan Liu, Hongbo Qin and Wei Zhao
Electronics 2019, 8(3), 295; https://doi.org/10.3390/electronics8030295 - 06 Mar 2019
Cited by 50 | Viewed by 7186
Abstract
Field programmable gate array (FPGA) is widely considered as a promising platform for convolutional neural network (CNN) acceleration. However, the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation. To solve this problem, this paper proposes [...] Read more.
Field programmable gate array (FPGA) is widely considered as a promising platform for convolutional neural network (CNN) acceleration. However, the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation. To solve this problem, this paper proposes an optimized compression strategy, and realizes an accelerator based on FPGA for CNNs. Firstly, a reversed-pruning strategy is proposed which reduces the number of parameters of AlexNet by a factor of 13× without accuracy loss on the ImageNet dataset. Peak-pruning is further introduced to achieve better compressibility. Moreover, quantization gives another 4× with negligible loss of accuracy. Secondly, an efficient storage technique, which aims for the reduction of the whole overhead cache of the convolutional layer and the fully connected layer, is presented respectively. Finally, the effectiveness of the proposed strategy is verified by an accelerator implemented on a Xilinx ZCU104 evaluation board. By improving existing pruning techniques and the storage format of sparse data, we significantly reduce the size of AlexNet by 28×, from 243 MB to 8.7 MB. In addition, the overall performance of our accelerator achieves 9.73 fps for the compressed AlexNet. Compared with the central processing unit (CPU) and graphics processing unit (GPU) platforms, our implementation achieves 182.3× and 1.1× improvements in latency and throughput, respectively, on the convolutional (CONV) layers of AlexNet, with an 822.0× and 15.8× improvement for energy efficiency, separately. This novel compression strategy provides a reference for other neural network applications, including CNNs, long short-term memory (LSTM), and recurrent neural networks (RNNs). Full article
Show Figures

Figure 1

18 pages, 5742 KiB  
Article
An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution
by Bing Liu, Danyin Zou, Lei Feng, Shou Feng, Ping Fu and Junbao Li
Electronics 2019, 8(3), 281; https://doi.org/10.3390/electronics8030281 - 03 Mar 2019
Cited by 69 | Viewed by 9289
Abstract
The Convolutional Neural Network (CNN) has been used in many fields and has achieved remarkable results, such as image classification, face detection, and speech recognition. Compared to GPU (graphics processing unit) and ASIC, a FPGA (field programmable gate array)-based CNN accelerator has great [...] Read more.
The Convolutional Neural Network (CNN) has been used in many fields and has achieved remarkable results, such as image classification, face detection, and speech recognition. Compared to GPU (graphics processing unit) and ASIC, a FPGA (field programmable gate array)-based CNN accelerator has great advantages due to its low power consumption and reconfigurable property. However, FPGA’s extremely limited resources and CNN’s huge amount of parameters and computational complexity pose great challenges to the design. Based on the ZYNQ heterogeneous platform and the coordination of resource and bandwidth issues with the roofline model, the CNN accelerator we designed can accelerate both standard convolution and depthwise separable convolution with a high hardware resource rate. The accelerator can handle network layers of different scales through parameter configuration and maximizes bandwidth and achieves full pipelined by using a data stream interface and ping-pong on-chip cache. The experimental results show that the accelerator designed in this paper can achieve 17.11GOPS for 32bit floating point when it can also accelerate depthwise separable convolution, which has obvious advantages compared with other designs. Full article
Show Figures

Figure 1

18 pages, 4563 KiB  
Article
Energy-Efficient Gabor Kernels in Neural Networks with Genetic Algorithm Training Method
by Fanjie Meng, Xinqing Wang, Faming Shao, Dong Wang and Xia Hua
Electronics 2019, 8(1), 105; https://doi.org/10.3390/electronics8010105 - 18 Jan 2019
Cited by 21 | Viewed by 6041
Abstract
Deep-learning convolutional neural networks (CNNs) have proven to be successful in various cognitive applications with a multilayer structure. The high computational energy and time requirements hinder the practical application of CNNs; hence, the realization of a highly energy-efficient and fast-learning neural network has [...] Read more.
Deep-learning convolutional neural networks (CNNs) have proven to be successful in various cognitive applications with a multilayer structure. The high computational energy and time requirements hinder the practical application of CNNs; hence, the realization of a highly energy-efficient and fast-learning neural network has aroused interest. In this work, we address the computing-resource-saving problem by developing a deep model, termed the Gabor convolutional neural network (Gabor CNN), which incorporates highly expression-efficient Gabor kernels into CNNs. In order to effectively imitate the structural characteristics of traditional weight kernels, we improve upon the traditional Gabor filters, having stronger frequency and orientation representations. In addition, we propose a procedure to train Gabor CNNs, termed the fast training method (FTM). In FTM, we design a new training method based on the multipopulation genetic algorithm (MPGA) and evaluation structure to optimize improved Gabor kernels, but train the rest of the Gabor CNN parameters with back-propagation. The training of improved Gabor kernels with MPGA is much more energy-efficient with less samples and iterations. Simple tasks, like character recognition on the Mixed National Institute of Standards and Technology database (MNIST), traffic sign recognition on the German Traffic Sign Recognition Benchmark (GTSRB), and face detection on the Olivetti Research Laboratory database (ORL), are implemented using LeNet architecture. The experimental result of the Gabor CNN and MPGA training method shows a 17–19% reduction in computational energy and time and an 18–21% reduction in storage requirements with a less than 1% accuracy decrease. We eliminated a significant fraction of the computation-hungry components in the training process by incorporating highly expression-efficient Gabor kernels into CNNs. Full article
Show Figures

Figure 1

19 pages, 752 KiB  
Article
A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs
by Zhiqiang Liu, Paul Chow, Jinwei Xu, Jingfei Jiang, Yong Dou and Jie Zhou
Electronics 2019, 8(1), 65; https://doi.org/10.3390/electronics8010065 - 07 Jan 2019
Cited by 43 | Viewed by 4721
Abstract
Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design [...] Read more.
Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. Full article
Show Figures

Figure 1

Review

Jump to: Research

25 pages, 376 KiB  
Review
A Review of Binarized Neural Networks
by Taylor Simons and Dah-Jye Lee
Electronics 2019, 8(6), 661; https://doi.org/10.3390/electronics8060661 - 12 Jun 2019
Cited by 151 | Viewed by 11878
Abstract
In this work, we review Binarized Neural Networks (BNNs). BNNs are deep neural networks that use binary values for activations and weights, instead of full precision values. With binary values, BNNs can execute computations using bitwise operations, which reduces execution time. Model sizes [...] Read more.
In this work, we review Binarized Neural Networks (BNNs). BNNs are deep neural networks that use binary values for activations and weights, instead of full precision values. With binary values, BNNs can execute computations using bitwise operations, which reduces execution time. Model sizes of BNNs are much smaller than their full precision counterparts. While the accuracy of a BNN model is generally less than full precision models, BNNs have been closing accuracy gap and are becoming more accurate on larger datasets like ImageNet. BNNs are also good candidates for deep learning implementations on FPGAs and ASICs due to their bitwise efficiency. We give a tutorial of the general BNN methodology and review various contributions, implementations and applications of BNNs. Full article
Show Figures

Figure 1

Back to TopTop