System-on-Chip (SoC) Design and Its Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Circuit and Signal Processing".

Deadline for manuscript submissions: closed (31 December 2021) | Viewed by 44386

Special Issue Editor


E-Mail Website
Guest Editor
School of Electronics and Information Engineering, Korea Aerospace University, Goyang-si 10540, Korea
Interests: system-on-chip (SoC) design; VLSI signal processing; HW accelerator for AI learing and inference; HW/SW co-design

Special Issue Information

Dear Colleagues:

In recent decades, innovative system-on-chip (SoC) design has been a very important issue, due to the market requirements for small-size and low-power products. In particular, SoC design has rapidly evolved from simple uni-core systems to complex systems with many heterogeneous cores communicating and cooperating via complex on-chip networks and shared resources. In addition, artificial intelligence is essential for various SoC applications, such as autonomous vehicle, internet of things, medical/healthcare, and consumer electronics.

The main aim of this Special Issue is to attract submissions of recent high-quality research as well as review articles on the recent progress for “SoC and Its Applications.” Topics in this Special Issue include (but are not limited to):

  • Circuits for SoC: RF, analog, digital, mixed-signal circuit IP for SoC
  • Signal processing for SoC: analog/digital VLSI signal processing for SoC design
  • Low-power design: low-power design methodology, power/energy management, energy harvesting
  • MPSoC architecture: on-chip interconnect, network-on-chip, memory architecture for multicore computing, platform architectures
  • SoC for AI applications: machine learning, deep neural network, neuromorphic computing
  • SoC for intelligent systems: automotive, IoT, medical/healthcare, wired/wireless communications, consumer electronics, etc.

Prof. Dr. Yunho Jung
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 16411 KiB  
Article
Analog Frontend for Reliable Human Body Temperature Measurement for IoT Devices
by Paweł Narczyk and Witold Adam Pleskacz
Electronics 2022, 11(3), 434; https://doi.org/10.3390/electronics11030434 - 31 Jan 2022
Cited by 1 | Viewed by 2403
Abstract
In this paper an analog frontend for a reliable measurement of the human body temperature is presented. A new and novel temperature calibration technique using an on-chip resistor was developed specifically for the analog frontend. The discussed analog frontend consists of a bandgap [...] Read more.
In this paper an analog frontend for a reliable measurement of the human body temperature is presented. A new and novel temperature calibration technique using an on-chip resistor was developed specifically for the analog frontend. The discussed analog frontend consists of a bandgap current reference, a precision current source, a programmable gain amplifier, a voltage source proportional to absolute temperature and an on-chip calibration resistor. The developed calibration technique enables very high accuracy, even 0.1 °C, to be obtained in a very wide temperature range. This calibration method was elaborated for integrated circuits operating in temperatures from −40 °C to +125 °C. The presented analog frontend consumes no more than 185 μA and was designed and manufactured with United Microelectronics Corporation (UMC) CMOS 130 nm technology. The data presented in the paper were obtained from process corner and Monte Carlo simulations as well as from measurements. The measurements were taken using a manual wafer prober with climate-controlled microchamber, at temperatures ranging from −40 °C to +125 °C. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

17 pages, 1682 KiB  
Article
CCSDS 131.2-B-1 Transmitter Design on FPGA with Adaptive Coding and Modulation Schemes for Satellite Communications
by Adrián Lamoral Coines and Víctor P. Gil Jiménez
Electronics 2021, 10(20), 2476; https://doi.org/10.3390/electronics10202476 - 12 Oct 2021
Cited by 5 | Viewed by 2338
Abstract
Satellite communications are a well-established research area in which the main innovation of last decade has been the use of multi-carrier modulations and more robust channel coding techniques. However, in recent years, novel advanced signal processing has started being developed for these communications [...] Read more.
Satellite communications are a well-established research area in which the main innovation of last decade has been the use of multi-carrier modulations and more robust channel coding techniques. However, in recent years, novel advanced signal processing has started being developed for these communications due to the increase in the signal processing capacity of transmitters and receivers. Although signal processing capabilities are increasing, they are still constrained by large limitations because these techniques need to be implemented in real hardware, thus making complexity a matter of critical importance. Therefore, this paper presents the design and implementation of a transmitter with adaptable coding and modulation on a field-programmable-gate-array (FPGA). The main motivation came from the standard CCSDS 131.2-B-1 which recommends that such a novel transmitter which has to date not been implemented in a real system The system was modeled by MATLAB with the purpose of being programmed in VHDL following the AXI-stream protocol between components. Behavioral simulation results were obtained in VIVADO and compared with MATLAB for verification purposes. The transmitter logical circuit was synthesized in a FPGA Zynq Ultrascale RFSoC ZU28DR, showing low resource consumption and correct functioning, leading us to conclude that the deployment of new communication systems in state-of-the-art hardware in satellite communications is justified. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

13 pages, 1160 KiB  
Article
COREA: Delay- and Energy-Efficient Approximate Adder Using Effective Carry Speculation
by Hyelin Seok, Hyoju Seo, Jungwon Lee and Yongtae Kim
Electronics 2021, 10(18), 2234; https://doi.org/10.3390/electronics10182234 - 12 Sep 2021
Cited by 10 | Viewed by 2134
Abstract
This paper presents a delay- and energy-efficient approximate adder design exploiting an effective carry speculation scheme with error reduction. The proposed scheme reduces the delay and improves the energy efficiency without any significant accuracy degradation by effectively adding the predicted carry input using [...] Read more.
This paper presents a delay- and energy-efficient approximate adder design exploiting an effective carry speculation scheme with error reduction. The proposed scheme reduces the delay and improves the energy efficiency without any significant accuracy degradation by effectively adding the predicted carry input using the OR operation. Additionally, the error reduction technique improves the overall computation accuracy at the expense of a few logic gates. As a result, the proposed adder achieves 3.84- and 7.79-times greater energy and energy-delay product (EDP) efficiencies than the traditional adder when implemented in 65-nm CMOS technology. In particular, when jointly analyzed with hardware accuracy, our design attains 69% and 70% reductions of the energy- and EDP-normalized mean error distance (NMED) products, respectively, compared to the other approximate adders under consideration. Furthermore, the proposed adder’s efficacy over the existing adders is demonstrated by adopting it in a machine learning application. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

12 pages, 20988 KiB  
Article
FPGA Implementation of the Range-Doppler Algorithm for Real-Time Synthetic Aperture Radar Imaging
by Yeongung Choi, Dongmin Jeong, Myeongjin Lee, Wookyung Lee and Yunho Jung
Electronics 2021, 10(17), 2133; https://doi.org/10.3390/electronics10172133 - 02 Sep 2021
Cited by 7 | Viewed by 3450
Abstract
In this paper, we propose a range-Doppler algorithm (RDA)-based synthetic aperture radar (SAR) processor for real-time SAR imaging and present FPGA-based implementation results. The processing steps for the RDA include range compression, range cell migration correction (RCMC), and azimuth compression. A matched filtering [...] Read more.
In this paper, we propose a range-Doppler algorithm (RDA)-based synthetic aperture radar (SAR) processor for real-time SAR imaging and present FPGA-based implementation results. The processing steps for the RDA include range compression, range cell migration correction (RCMC), and azimuth compression. A matched filtering unit (MFU) and an RCMC processing unit (RPU) are required for real-time processing. Therefore, the proposed RDA-based SAR processor contains an MFU that uses the mixed-radix multi-path delay commutator (MRMDC) FFT and an RPU. The MFU reduces the memory requirements by applying a decimation-in-frequency (DIF) FFT and decimation-in-time (DIT) IFFT. The RPU provides a variable tap size and variable interpolation kernel. In addition, the MFU and RPU are designed to enable parallel processing of four 32-bit which are transferred via a 128-bit AXI bus. The proposed RDA-based SAR processor was designed using Verilog-HDL and implemented in a Xilinx UltraScale+ MPSoC FPGA device. After comparing the execution time taken by the proposed SAR processor with that taken by an ARM cortex-A53 microprocessor, we observed a 85-fold speedup for a 2048 × 2048 pixel image. A performance evaluation based on related studies indicated that the proposed processor achieved an execution time that was approximately 6.5 times less than those of previous FPGA implementations of RDA processors. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

15 pages, 1898 KiB  
Article
AERO: A 1.28 MOP/s/LUT Reconfigurable Inference Processor for Recurrent Neural Networks in a Resource-Limited FPGA
by Jinwon Kim, Jiho Kim and Tae-Hwan Kim
Electronics 2021, 10(11), 1249; https://doi.org/10.3390/electronics10111249 - 24 May 2021
Cited by 5 | Viewed by 2016
Abstract
This study presents a resource-efficient reconfigurable inference processor for recurrent neural networks (RNN), named AERO. AERO is programmable to perform inference on RNN models of various types. This was designed based on the instruction-set architecture specializing in processing primitive vector operations that compose [...] Read more.
This study presents a resource-efficient reconfigurable inference processor for recurrent neural networks (RNN), named AERO. AERO is programmable to perform inference on RNN models of various types. This was designed based on the instruction-set architecture specializing in processing primitive vector operations that compose the dataflows of RNN models. A versatile vector-processing unit (VPU) was incorporated to perform every vector operation and achieve a high resource efficiency. Aiming at a low resource usage, the multiplication in VPU is carried out on the basis of an approximation scheme. In addition, the activation functions are realized with the reduced tables. We developed a prototype inference system based on AERO using a resource-limited field-programmable gate array, under which the functionality of AERO was verified extensively for inference tasks based on several RNN models of different types. The resource efficiency of AERO was found to be as high as 1.28 MOP/s/LUT, which is 1.3-times higher than the previous state-of-the-art result. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

15 pages, 9268 KiB  
Article
Scalable ESPRIT Processor for Direction-of-Arrival Estimation of Frequency Modulated Continuous Wave Radar
by Yongchul Jung, Hohyub Jeon, Seongjoo Lee and Yunho Jung
Electronics 2021, 10(6), 695; https://doi.org/10.3390/electronics10060695 - 16 Mar 2021
Cited by 12 | Viewed by 2871
Abstract
The estimation of signal parameters via rotational invariance techniques (ESPRIT) is an algorithm that uses the shift-invariant properties of the array antenna to estimate the direction-of-arrival (DOA) of signals received in the array antenna. Since the ESPRIT algorithm requires high-complexity operations such as [...] Read more.
The estimation of signal parameters via rotational invariance techniques (ESPRIT) is an algorithm that uses the shift-invariant properties of the array antenna to estimate the direction-of-arrival (DOA) of signals received in the array antenna. Since the ESPRIT algorithm requires high-complexity operations such as covariance matrix and eigenvalue decomposition, a hardware processor must be implemented such that the DOA is estimated in real time. Additionally, the ESPRIT processor should support a scalable number of antenna configuration for DOA estimation in various applications because the performance of ESPRIT depends on the number of antennas. Therefore, we propose an ESPRIT processor that supports two to eight scalable antenna configuration. In addition, since the proposed ESPRIT processor is based on multiple invariances (MI) algorithm, it can achieve a much better performance than the existing ESPRIT processor. The execution time is reduced by simplifying the Jacobi method, which has the most significant computational complexity for calculating eigenvalue decomposition (EVD) in ESPRIT. Moreover, the ESPRIT processor was designed using hardware description language (HDL), and an FPGA-based verification was performed. The proposed ESPRIT processor was implemented with 10,088 slice registers, 18,207 LUTs, and 80 DSPs, and the slice register, LUT, and DSP were reduced by up to 71.45%, 54.5%, and 68.38%, respectively, compared to the existing structure. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

14 pages, 1293 KiB  
Article
A Modified KNN Algorithm for High-Performance Computing on FPGA of Real-Time m-QAM Demodulators
by David Marquez-Viloria, Luis Castano-Londono and Neil Guerrero-Gonzalez
Electronics 2021, 10(5), 627; https://doi.org/10.3390/electronics10050627 - 09 Mar 2021
Cited by 7 | Viewed by 3094
Abstract
A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and [...] Read more.
A methodology for scalable and concurrent real-time implementation of highly recurrent algorithms is presented and experimentally validated using the AWS-FPGA. This paper presents a parallel implementation of a KNN algorithm focused on the m-QAM demodulators using high-level synthesis for fast prototyping, parameterization, and scalability of the design. The proposed design shows the successful implementation of the KNN algorithm for interchannel interference mitigation in a 3 × 16 Gbaud 16-QAM Nyquist WDM system. Additionally, we present a modified version of the KNN algorithm in which comparisons among data symbols are reduced by identifying the closest neighbor using the rule of the 8-connected clusters used for image processing. Real-time implementation of the modified KNN on a Xilinx Virtex UltraScale+ VU9P AWS-FPGA board was compared with the results obtained in previous work using the same data from the same experimental setup but offline DSP using Matlab. The results show that the difference is negligible below FEC limit. Additionally, the modified KNN shows a reduction of operations from 43 percent to 75 percent, depending on the symbol’s position in the constellation, achieving a reduction 47.25% reduction in total computational time for 100 K input symbols processed on 20 parallel cores compared to the KNN algorithm. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

11 pages, 1429 KiB  
Article
Area-Time Efficient Two-Dimensional Reconfigurable Integer DCT Architecture for HEVC
by Pramod Kumar Meher, Siew-Kei Lam, Thambipillai Srikanthan, Dong Hwan Kim and Sang Yoon Park
Electronics 2021, 10(5), 603; https://doi.org/10.3390/electronics10050603 - 05 Mar 2021
Cited by 4 | Viewed by 2247
Abstract
In this paper, we present area-time efficient reconfigurable architectures for the implementation of the integer discrete cosine transform (DCT), which supports all the transform lengths to be used in High Efficiency Video Coding (HEVC). We propose three 1D reconfigurable architectures that can be [...] Read more.
In this paper, we present area-time efficient reconfigurable architectures for the implementation of the integer discrete cosine transform (DCT), which supports all the transform lengths to be used in High Efficiency Video Coding (HEVC). We propose three 1D reconfigurable architectures that can be configured for the computation of the DCT of any of the prescribed lengths such as 4, 8, 16, and 32. It is shown that matrix multiplication schemes involving fewer adders can be used to derive parallel architectures for 1D integer DCT of different lengths. A novel transposition buffer is designed to be used for the proposed 2D DCT architecture, which offers double the throughput without increasing the size of the transposition buffer. We determine the optimal pipeline locations in the proposed design through the precise estimation of propagation delays and the critical path so that the area-delay-product is optimized and all the output samples are obtained in the same cycle in spite of the recursive nature of the structure. Implementation results show that the proposed 2D integer DCT architectures provide significantly higher throughput per unit area than the existing designs for HEVC. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

18 pages, 3275 KiB  
Article
Low-Complexity High-Throughput QC-LDPC Decoder for 5G New Radio Wireless Communication
by Tram Thi Bao Nguyen, Tuy Nguyen Tan and Hanho Lee
Electronics 2021, 10(4), 516; https://doi.org/10.3390/electronics10040516 - 22 Feb 2021
Cited by 21 | Viewed by 5343
Abstract
This paper presents a pipelined layered quasi-cyclic low-density parity-check (QC-LDPC) decoder architecture targeting low-complexity, high-throughput, and efficient use of hardware resources compliant with the specifications of 5G new radio (NR) wireless communication standard. First, a combined min-sum (CMS) decoding algorithm, which is a [...] Read more.
This paper presents a pipelined layered quasi-cyclic low-density parity-check (QC-LDPC) decoder architecture targeting low-complexity, high-throughput, and efficient use of hardware resources compliant with the specifications of 5G new radio (NR) wireless communication standard. First, a combined min-sum (CMS) decoding algorithm, which is a combination of the offset min-sum and the original min-sum algorithm, is proposed. Then, a low-complexity and high-throughput pipelined layered QC-LDPC decoder architecture for enhanced mobile broadband specifications in 5G NR wireless standards based on CMS algorithm with pipeline layered scheduling is presented. Enhanced versions of check node-based processor architectures are proposed to improve the complexity of the LDPC decoders. An efficient minimum-finder for the check node unit architecture that reduces the hardware required for the computation of the first two minima is introduced. Moreover, a low complexity a posteriori information update unit architecture, which only requires one adder array for their operations, is presented. The proposed architecture shows significant improvements in terms of area and throughput compared to other QC-LDPC decoder architectures available in the literature. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

13 pages, 1888 KiB  
Article
Reconfigurable Binary Neural Network Accelerator with Adaptive Parallelism Scheme
by Jaechan Cho, Yongchul Jung, Seongjoo Lee and Yunho Jung
Electronics 2021, 10(3), 230; https://doi.org/10.3390/electronics10030230 - 20 Jan 2021
Cited by 10 | Viewed by 3084
Abstract
Binary neural networks (BNNs) have attracted significant interest for the implementation of deep neural networks (DNNs) on resource-constrained edge devices, and various BNN accelerator architectures have been proposed to achieve higher efficiency. BNN accelerators can be divided into two categories: streaming and layer [...] Read more.
Binary neural networks (BNNs) have attracted significant interest for the implementation of deep neural networks (DNNs) on resource-constrained edge devices, and various BNN accelerator architectures have been proposed to achieve higher efficiency. BNN accelerators can be divided into two categories: streaming and layer accelerators. Although streaming accelerators designed for a specific BNN network topology provide high throughput, they are infeasible for various sensor applications in edge AI because of their complexity and inflexibility. In contrast, layer accelerators with reasonable resources can support various network topologies, but they operate with the same parallelism for all the layers of the BNN, which degrades throughput performance at certain layers. To overcome this problem, we propose a BNN accelerator with adaptive parallelism that offers high throughput performance in all layers. The proposed accelerator analyzes target layer parameters and operates with optimal parallelism using reasonable resources. In addition, this architecture is able to fully compute all types of BNN layers thanks to its reconfigurability, and it can achieve a higher area–speed efficiency than existing accelerators. In performance evaluation using state-of-the-art BNN topologies, the designed BNN accelerator achieved an area–speed efficiency 9.69 times higher than previous FPGA implementations and 24% higher than existing VLSI implementations for BNNs. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

13 pages, 4659 KiB  
Article
Area-Efficient Vision-Based Feature Tracker for Autonomous Hovering of Unmanned Aerial Vehicle
by Hyeon Kim, Jaechan Cho, Yongchul Jung, Seongjoo Lee and Yunho Jung
Electronics 2020, 9(10), 1591; https://doi.org/10.3390/electronics9101591 - 28 Sep 2020
Cited by 5 | Viewed by 2641
Abstract
In this paper, we propose a vision-based feature tracker for the autonomous hovering of an unmanned aerial vehicle (UAV) and present an area-efficient hardware architecture for its integration into a flight control system-on-chip, which is essential for small UAVs. The proposed feature tracker [...] Read more.
In this paper, we propose a vision-based feature tracker for the autonomous hovering of an unmanned aerial vehicle (UAV) and present an area-efficient hardware architecture for its integration into a flight control system-on-chip, which is essential for small UAVs. The proposed feature tracker is based on the Shi–Tomasi algorithm for feature detection and the pyramidal Lucas–Kanade (PLK) algorithm for feature tracking. By applying an efficient hardware structure that leverages the common computations between the Shi–Tomasi and PLK algorithms, the proposed feature tracker offers good tracking performance with fewer hardware resources than existing feature tracker implementations. To evaluate the tracking performance of the proposed feature tracker, we compared it with the GPS-based trajectories of a drone in various flight environments, such as lawn, asphalt, and sidewalk blocks. The proposed tracker exhibited an average accuracy of 0.039 in terms of normalized root-mean-square error (NRMSE). The proposed feature tracker was designed using the Verilog hardware description language and implemented on a field-programmable gate array (FPGA). The proposed feature tracker has 2744 slices, 25 DSPs, and 93 Kbit memory and can support the real-time processing at 417 FPS and an operating frequency of 130 MHz for 640 × 480 VGA images. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

12 pages, 2031 KiB  
Article
High Efficiency Ring-LWE Cryptoprocessor Using Shared Arithmetic Components
by Tuy Nguyen Tan, Tram Thi Bao Nguyen and Hanho Lee
Electronics 2020, 9(7), 1075; https://doi.org/10.3390/electronics9071075 - 30 Jun 2020
Cited by 1 | Viewed by 2787
Abstract
A high efficiency architecture for ring learning with errors (ring-LWE) cryptoprocessor using shared arithmetic components is presented in this paper. By applying a novel approach for sharing number theoretic transform (NTT) polynomial multiplier and polynomial adder in encryption and decryption operations, the total [...] Read more.
A high efficiency architecture for ring learning with errors (ring-LWE) cryptoprocessor using shared arithmetic components is presented in this paper. By applying a novel approach for sharing number theoretic transform (NTT) polynomial multiplier and polynomial adder in encryption and decryption operations, the total number of polynomial multipliers and polynomial adders used in the proposed ring-LWE cryptoprocessor are reduced. In addition, the processing time of NTT polynomial multiplier is speeded up by employing multiple-path delay feedback (MDF) architecture and deploying pipelined technique between all stages of NTT processes. As a result, the proposed architecture offers a great reduction in terms of the hardware complexity and computation latency compared with existing works. The implementation result for the proposed ring-LWE cryptoprocessor on Virtex-7 FPGA board using Xilinx VIVADO shows a significant decrease in the number of slices and LUTs compared with previous works. Moreover, the proposed ring-LWE cryptoprocessor offers higher throughput and efficiency than its predecessors. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

13 pages, 1868 KiB  
Article
Design and Analysis of an Approximate Adder with Hybrid Error Reduction
by Hyoju Seo, Yoon Seok Yang and Yongtae Kim
Electronics 2020, 9(3), 471; https://doi.org/10.3390/electronics9030471 - 11 Mar 2020
Cited by 44 | Viewed by 5796
Abstract
This paper presents an energy-efficient approximate adder with a novel hybrid error reduction scheme to significantly improve the computation accuracy at the cost of extremely low additional power and area overheads. The proposed hybrid error reduction scheme utilizes only two input bits and [...] Read more.
This paper presents an energy-efficient approximate adder with a novel hybrid error reduction scheme to significantly improve the computation accuracy at the cost of extremely low additional power and area overheads. The proposed hybrid error reduction scheme utilizes only two input bits and adjusts the approximate outputs to reduce the error distance, which leads to an overall improvement in accuracy. The proposed design, when implemented in 65-nm CMOS technology, has 3, 2, and 2 times greater energy, power, and area efficiencies, respectively, than conventional accurate adders. In terms of the accuracy, the proposed hybrid error reduction scheme allows that the error rate of the proposed adder decreases to 50% whereas those of the lower-part OR adder and optimized lower-part OR constant adder reach 68% and 85%, respectively. Furthermore, the proposed adder has up to 2.24, 2.24, and 1.16 times better performance with respect to the mean error distance, normalized mean error distance (NMED), and mean relative error distance, respectively, than the other approximate adder considered in this paper. Importantly, because of an excellent design tradeoff among delay, power, energy, and accuracy, the proposed adder is found to be the most competitive approximate adder when jointly analyzed in terms of the hardware cost and computation accuracy. Specifically, our proposed adder achieves 51%, 49%, and 47% reductions of the power-, energy-, and error-delay-product-NMED products, respectively, compared to the other considered approximate adders. Full article
(This article belongs to the Special Issue System-on-Chip (SoC) Design and Its Applications)
Show Figures

Figure 1

Back to TopTop