Approximate Computing: Design, Acceleration, Validation and Testing of Circuits, Architectures and Algorithms in Future Systems

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: closed (30 November 2021) | Viewed by 32566

Special Issue Editor


E-Mail Website
Guest Editor
Politecnico di Torino, Corso Duca degli Abruzzi, 24, 10129 Torino, Italy
Interests: approximate computing; reliability assessment; software-based self-test; statistical models
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In recent years, the applicability of approximate computing has represented a breakthrough in many scientific areas, making AC a step closer to being one of the mainstream computing approaches in future systems. First, it is becoming more and more difficult to achieve significant performance improvement with the scaling of CMOS technology. Second, modern architectures vary from HPC to embedded systems (e.g., IoT, autonomous driving, etc.), making room for the need for a trade-off between efficiency, in terms of memory and performance resources, and power consumption, and the quality of the final outcomes. In this sense, for several application domains, especially those related to human perception, the approximate results might turn out to be hard to distinguish from perfect results, opening the application of AC for system designers.

Suitable solutions will not be fully realized in a single layer only. Therefore, applying AC in different layers of hardware, architecture, software and algorithms should be investigated. Moreover, while the hidden cost of AC is a reduction of an application’s inherent resiliency to errors, AC has also recently been demonstrated to be effective in safety-critical applications.

This Special Issue on AC will explore exciting, new ideas in the field of approximate computing, covering cross-layer design methodologies bridging the circuit, architecture and algorithm levels. It will also include connections between the AC paradigm and the safety, verification, testing and reliability of digital systems.

Topics for this Special Issue include (but are not limited to):

  • Analog and circuit-level approximation techniques
  • Approximation-induced error modeling and propagation
  • Approximation techniques for emerging processor and memory technologies
  • Architectural support for AC
  • Dependability of approximate circuits and systems
  • Design automation of AC architectures
  • Design of reconfigurable AC architectures
  • Error-resilient near-threshold computing
  • Hardware accelerators for approximation-tolerant application domains
  • Hardware/software co-design of AC systems
  • Language, compiler, and operating system support for approximate architectures
  • Safety and reliability applications of approximate computing
  • Techniques for monitoring and controlling approximation quality
  • Test and fault tolerance of approximate systems
  • Verification of approximate systems

Dr. Alessandro Savino
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Approximate Computing
  • Reconfigurable Systems
  • Hardware Accelerators
  • Safety-Critical Applications
  • Reliability Assessment
  • Fault Tolerance

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

15 pages, 954 KiB  
Article
Hardware-Based Activation Function-Core for Neural Network Implementations
by Griselda González-Díaz_Conti, Javier Vázquez-Castillo, Omar Longoria-Gandara, Alejandro Castillo-Atoche, Roberto Carrasco-Alvarez, Adolfo Espinoza-Ruiz and Erica Ruiz-Ibarra
Electronics 2022, 11(1), 14; https://doi.org/10.3390/electronics11010014 - 22 Dec 2021
Cited by 3 | Viewed by 2926
Abstract
Today, embedded systems (ES) tend towards miniaturization and the carrying out of complex tasks in applications such as the Internet of Things, medical systems, telecommunications, among others. Currently, ES structures based on artificial intelligence using hardware neural networks (HNNs) are becoming more common. [...] Read more.
Today, embedded systems (ES) tend towards miniaturization and the carrying out of complex tasks in applications such as the Internet of Things, medical systems, telecommunications, among others. Currently, ES structures based on artificial intelligence using hardware neural networks (HNNs) are becoming more common. In the design of HNN, the activation function (AF) requires special attention due to its impact on the HNN performance. Therefore, implementing activation functions (AFs) with good performance, low power consumption, and reduced hardware resources is critical for HNNs. In light of this, this paper presents a hardware-based activation function-core (AFC) to implement an HNN. In addition, this work shows a design framework for the AFC that applies a piecewise polynomial approximation (PPA) technique. The designed AFC has a reconfigurable architecture with a wordlength-efficient decoder, i.e., reduced hardware resources are used to satisfy the desired accuracy. Experimental results show a better performance of the proposed AFC in terms of hardware resources and power consumption when it is compared with state of the art implementations. Finally, two case studies were implemented to corroborate the AFC performance in widely used ANN applications. Full article
Show Figures

Figure 1

33 pages, 1489 KiB  
Article
Self-Adaptive Run-Time Variable Floating-Point Precision for Iterative Algorithms: A Joint HW/SW Approach
by Noureddine Ait Said, Mounir Benabdenbi and Katell Morin-Allory
Electronics 2021, 10(18), 2209; https://doi.org/10.3390/electronics10182209 - 09 Sep 2021
Viewed by 1469
Abstract
Using standard Floating-Point (FP) formats for computation leads to significant hardware overhead since these formats are over-designed for error-resilient workloads such as iterative algorithms. Hence, hardware FP Unit (FPU) architectures need run-time variable precision capabilities. In this work, we propose a new method [...] Read more.
Using standard Floating-Point (FP) formats for computation leads to significant hardware overhead since these formats are over-designed for error-resilient workloads such as iterative algorithms. Hence, hardware FP Unit (FPU) architectures need run-time variable precision capabilities. In this work, we propose a new method and an FPU architecture that enable designers to dynamically tune FP computations’ precision automatically at run-time called Variable Precision in Time (VPT), leading to significant power consumption, execution time, and energy savings. In spite of its circuit area overhead, the proposed approach simplifies the integration of variable precision in existing software workloads at any level of the software stack (OS, RTOS, or application-level): it only requires lightweight software support and solely relies on traditional assembly instructions, without the need for a specialized compiler or custom instructions. We apply the technique on the Jacobi and the Gauss–Seidel iterative methods taking full advantage of the suggested FPU. For each algorithm, two modified versions are proposed: a conservative version and a relaxed one. Both algorithms are analyzed and compared statistically to understand the effects of VPT on iterative applications. The implementations demonstrate up to 70.67% power consumption saving, up to 59.80% execution time saving, and up to 88.20% total energy saving w.r.t the reference double precision implementation, and with no accuracy loss. Full article
Show Figures

Figure 1

14 pages, 351 KiB  
Article
The Study of Monotonic Core Functions and Their Use to Build RNS Number Comparators
by Mikhail Babenko, Stanislaw J. Piestrak, Nikolay Chervyakov and Maxim Deryabin
Electronics 2021, 10(9), 1041; https://doi.org/10.3390/electronics10091041 - 28 Apr 2021
Cited by 2 | Viewed by 1674
Abstract
A non-positional residue number system (RNS) enjoys particularly efficient implementation of addition and multiplication, but non-modular arithmetic operations in RNS-like number comparison are known to be difficult. In this paper, a new technique for designing comparators of RNS numbers represented in an arbitrary [...] Read more.
A non-positional residue number system (RNS) enjoys particularly efficient implementation of addition and multiplication, but non-modular arithmetic operations in RNS-like number comparison are known to be difficult. In this paper, a new technique for designing comparators of RNS numbers represented in an arbitrary moduli set is presented. It is based on using the core function for which it was shown that it must be monotonic to allow for RNS number comparison. The conditions of the monotonicity of the core function were formulated, which also ensured the minimal range of the core function (essential to obtain the best characteristics of the comparator). The best choice is a core function in which only one coefficient corresponding to the largest modulus is set to 1 whereas all other coefficients are set to 0. It is also shown that the already known diagonal function is nothing else but the special case of the core function with all coefficients set to 1. Performance evaluation suggests that the new comparator uses less hardware and in some cases also introduces smaller delay than its counterparts based on diagonal function. The potential applications of the new comparator include some recently developed homomorphic encryption algorithms implemented using RNS. Full article
Show Figures

Figure 1

16 pages, 1360 KiB  
Article
RNS Number Comparator Based on a Modified Diagonal Function
by Mikhail Babenko, Maxim Deryabin, Stanislaw J. Piestrak, Piotr Patronik, Nikolay Chervyakov, Andrei Tchernykh and Arutyun Avetisyan
Electronics 2020, 9(11), 1784; https://doi.org/10.3390/electronics9111784 - 27 Oct 2020
Cited by 13 | Viewed by 2621
Abstract
Number comparison has long been recognized as one of the most fundamental non-modular arithmetic operations to be executed in a non-positional Residue Number System (RNS). In this paper, a new technique for designing comparators of RNS numbers represented in an arbitrary moduli set [...] Read more.
Number comparison has long been recognized as one of the most fundamental non-modular arithmetic operations to be executed in a non-positional Residue Number System (RNS). In this paper, a new technique for designing comparators of RNS numbers represented in an arbitrary moduli set is presented. It is based on a newly introduced modified diagonal function, whose strictly monotonic properties make it possible to replace the cumbersome operations of finding the remainder of the division by a large and awkward number with significantly simpler computations involving only a power of 2 modulus. Comparators of numbers represented in sample RNSs composed of varying numbers of moduli and offering different dynamic ranges, designed using various methods, were synthesized for the 65 nm technology. The experimental results suggest that the new circuits enjoy a delay reduction ranging from over 11% to over 75% compared to the fastest circuits designed using existing methods. Moreover, it is achieved using less hardware, the reduction of which reaches over 41%, and is accompanied by significantly reduced power-consumption, which in several cases exceeds 100%. Therefore, it seems that the presented method leads to the design of the most efficient current hardware comparators of numbers represented using a general RNS moduli set. Full article
Show Figures

Figure 1

15 pages, 1771 KiB  
Article
Survey on Approximate Computing and Its Intrinsic Fault Tolerance
by Gennaro Rodrigues, Fernanda Lima Kastensmidt and Alberto Bosio
Electronics 2020, 9(4), 557; https://doi.org/10.3390/electronics9040557 - 26 Mar 2020
Cited by 27 | Viewed by 4985
Abstract
This work is a survey on approximate computing and its impact on fault tolerance, especially for safety-critical applications. It presents a multitude of approximation methodologies, which are typically applied at software, architecture, and circuit level. Those methodologies are discussed and compared on all [...] Read more.
This work is a survey on approximate computing and its impact on fault tolerance, especially for safety-critical applications. It presents a multitude of approximation methodologies, which are typically applied at software, architecture, and circuit level. Those methodologies are discussed and compared on all their possible levels of implementations (some techniques are applied at more than one level). Approximation is also presented as a means to provide fault tolerance and high reliability: Traditional error masking techniques, such as triple modular redundancy, can be approximated and thus have their implementation and execution time costs reduced compared to the state of the art. Full article
Show Figures

Figure 1

21 pages, 2313 KiB  
Article
HEAP: A Holistic Error Assessment Framework for Multiple Approximations Using Probabilistic Graphical Models
by Jiajia Jiao
Electronics 2020, 9(2), 373; https://doi.org/10.3390/electronics9020373 - 22 Feb 2020
Cited by 1 | Viewed by 2320
Abstract
Approximate computing has been a good paradigm of energy-efficient accelerator design. Accurate and fast error estimation is critical for appropriate approximate techniques selection so that power saving (or performance improvement) can be maximized with acceptable output quality in approximate accelerators. In the paper, [...] Read more.
Approximate computing has been a good paradigm of energy-efficient accelerator design. Accurate and fast error estimation is critical for appropriate approximate techniques selection so that power saving (or performance improvement) can be maximized with acceptable output quality in approximate accelerators. In the paper, we propose HEAP, a Holistic Error assessment framework to characterize multiple Approximate techniques with Probabilistic graphical models (PGM) in a joint way. HEAP maps the problem of evaluating errors induced by different approximate techniques into a PGM issue, including: (1) A heterogeneous Bayesian network is represented by converting an application’s data flow graph, where various approximate options are {precise, approximate} two-state X*-type nodes, while input or operating variables are {precise, approximate, unacceptable} three-state X-type nodes. These two different kinds of nodes are separately used to configure the available approximate techniques and track the corresponding error propagation for guaranteed configurability; (2) node learning is accomplished via an approximate library, which consists of probability mass functions of multiple approximate techniques to fast calculate each node’s Conditional Probability Table by mechanistic modeling or empirical modeling; (3) exact inference provides the probability distribution of output quality at three levels of precise, approximate, and unacceptable. We do a complete case study of 3 × 3 Gaussian kernels with different approximate configurations to verify HEAP. The comprehensive results demonstrate that HEAP is helpful to explore design space for power-efficient approximate accelerators, with just 4.18% accuracy loss and 3.34 × 105 speedup on average over Mentor Carlo simulation. Full article
Show Figures

Figure 1

14 pages, 1084 KiB  
Article
FPGA-Based Hardware Matrix Inversion Architecture Using Hybrid Piecewise Polynomial Approximation Systolic Cells
by Javier Vázquez-Castillo, Alejandro Castillo-Atoche, Roberto Carrasco-Alvarez, Omar Longoria-Gandara and Jaime Ortegón-Aguilar
Electronics 2020, 9(1), 182; https://doi.org/10.3390/electronics9010182 - 18 Jan 2020
Cited by 6 | Viewed by 4498
Abstract
The hardware of the matrix inversion architecture using QR decomposition with Givens Rotations (GR) and a back substitution (BS) block is required for many signal processing algorithms. However, the hardware of the GR algorithm requires the implementation of complex operations, such as the [...] Read more.
The hardware of the matrix inversion architecture using QR decomposition with Givens Rotations (GR) and a back substitution (BS) block is required for many signal processing algorithms. However, the hardware of the GR algorithm requires the implementation of complex operations, such as the reciprocal square root (RSR), which is typically implemented using LookUp Table (LUT) and COordinate Rotation DIgital Computer (CORDICs), among others, conveying to either high-area consumption or low throughput. This paper introduces an Field-Programmable Gate Array (FPGA)-based full matrix inversion architecture using hybrid piecewise polynomial approximation systolic cells. In the design, a hybrid segmentation technique was incorporated for the implementation of piecewise polynomial systolic cells. This hybrid approach is composed by an external and internal segmentation, where the first is nonuniform and the second is uniform, fitting the curve shape of the complex functions achieving a better signal-quantization-to noise-ratio; furthermore, it improves the time performance and area resources. Experimental results reveal a well-balanced improvement in the design achieving high throughput and, hence, less resource utilization in comparison to state-of-the-art FPGA-based architectures. In our study, the proposed design achieves 7.51 Mega-Matrices per second for performing 4 × 4 matrix operations with a latency of 12 clock cycles; meanwhile, the hardware design requires only 1474 slice registers, 1458 LUTs in an FPGA Virtex-5 XC5VLX220T, and 1474 slice registers and 1378 LUTs when a FPGA Virtex-6 XC6VLX240T is used. Full article
Show Figures

Figure 1

18 pages, 403 KiB  
Article
Using Approximate Computing and Selective Hardening for the Reduction of Overheads in the Design of Radiation-Induced Fault-Tolerant Systems
by Alexander Aponte-Moreno, Felipe Restrepo-Calle and Cesar Pedraza
Electronics 2019, 8(12), 1539; https://doi.org/10.3390/electronics8121539 - 13 Dec 2019
Cited by 3 | Viewed by 2740
Abstract
Fault mitigation techniques based on pure software, known as software-implemented hardware fault tolerance (SIHFT), are very attractive for use in COTS (commercial off-the-shelf) microprocessors because they do not require physical modification of the system. However, these techniques cause software overheads that may affect [...] Read more.
Fault mitigation techniques based on pure software, known as software-implemented hardware fault tolerance (SIHFT), are very attractive for use in COTS (commercial off-the-shelf) microprocessors because they do not require physical modification of the system. However, these techniques cause software overheads that may affect the efficiency and costs of the overall system. This paper presents a design method of radiation-induced fault-tolerant microprocessor-based systems with lower execution time overheads. For this purpose, approximate computing and selective fault mitigation software-based techniques are used; thus it can be used in COTS devices. The proposal is validated through a case study for the TI MSP430 microcontroller. Results show that the designer can choose among a wide spectrum of design configurations, exploring different trade-offs between reliability, performance, and accuracy of results. Full article
Show Figures

Figure 1

17 pages, 909 KiB  
Article
A High-Speed Division Algorithm for Modular Numbers Based on the Chinese Remainder Theorem with Fractions and Its Hardware Implementation
by Nikolai Chervyakov, Pavel Lyakhov, Mikhail Babenko, Anton Nazarov, Maxim Deryabin, Irina Lavrinenko and Anton Lavrinenko
Electronics 2019, 8(3), 261; https://doi.org/10.3390/electronics8030261 - 27 Feb 2019
Cited by 11 | Viewed by 3491
Abstract
In this paper, a new simplified iterative division algorithm for modular numbers that is optimized on the basis of the Chinese remainder theorem (CRT) with fractions is developed. It requires less computational resources than the CRT with integers and mixed radix number systems [...] Read more.
In this paper, a new simplified iterative division algorithm for modular numbers that is optimized on the basis of the Chinese remainder theorem (CRT) with fractions is developed. It requires less computational resources than the CRT with integers and mixed radix number systems (MRNS). The main idea of the algorithm is (a) to transform the residual representation of the dividend and divisor into a weighted fixed-point code and (b) to find the higher power of 2 in the divisor written in a residue number system (RNS). This information is acquired using the CRT with fractions: higher power is defined by the number of zeros standing before the first significant digit. All intermediate calculations of the algorithm involve the operations of right shift and subtraction, which explains its good performance. Due to the abovementioned techniques, the algorithm has higher speed and consumes less computational resources, thereby being more appropriate for the multidigit division of modular numbers than the algorithms described earlier. The new algorithm suggested in this paper has O (log2 Q) iterations, where Q is the quotient. For multidigit numbers, its modular division complexity is Q(N), where N denotes the number of bits in a certain fraction required to restore the number by remainders. Since the number N is written in a weighed system, the subtraction-based comparison runs very fast. Hence, this algorithm might be the best currently available. Full article
Show Figures

Figure 1

18 pages, 6672 KiB  
Article
A Novel Multicomponent PSO Algorithm Applied in FDE–AJTF Decomposition
by Lei Yu, Guochao Lao, Chunsheng Li, Yang Sun and Yingying Li
Electronics 2019, 8(1), 51; https://doi.org/10.3390/electronics8010051 - 02 Jan 2019
Cited by 1 | Viewed by 2478
Abstract
The echo of maneuvering targets can be expressed as a multicomponent polynomial phase signal (mc-PPS), which should be processed by time frequency analysis methods, while, as a modified maximum likelihood (ML) method, the frequency domain extraction-based adaptive joint time frequency (FDE–AJTF) decomposition method [...] Read more.
The echo of maneuvering targets can be expressed as a multicomponent polynomial phase signal (mc-PPS), which should be processed by time frequency analysis methods, while, as a modified maximum likelihood (ML) method, the frequency domain extraction-based adaptive joint time frequency (FDE–AJTF) decomposition method is an effective tool. However, the key procedure in the FDE–AJTF method is searching for the optimal parameters in the solution space, which is essentially a multidimensional optimization problem with different extremal solutions. To solve the problem, a novel multicomponent particle swarm optimization (mc-PSO) algorithm is presented and applied in the FDE–AJTF decomposition with the new characteristic that can extract several components simultaneously based on the feature of the standard PSO, in which the population is divided into three groups and the neighborhood of the best particle in the optimal group is set as the forbidden area for the suboptimal group, and then two different independent components can be obtained and extracted in one extraction. To analyze its performance, three simulation tests are carried out and compared with a standard PSO, genetic algorithm, and differential evolution algorithm. According to the tests, it is verified that the mc-PSO has the best performance in that the convergence, accuracy, and stability are improved, while its searching times and computation are reduced. Full article
Show Figures

Graphical abstract

Review

Jump to: Research

15 pages, 3165 KiB  
Review
Gate-Level Static Approximate Adders: A Comparative Analysis
by Padmanabhan Balasubramanian, Raunaq Nayar and Douglas L. Maskell
Electronics 2021, 10(23), 2917; https://doi.org/10.3390/electronics10232917 - 25 Nov 2021
Cited by 2 | Viewed by 1841
Abstract
Approximate or inaccurate addition is found to be viable for practical applications which have an inherent error tolerance. Approximate addition is realized using an approximate adder, and many approximate adder designs have been put forward in the literature targeting an acceptable trade-off between [...] Read more.
Approximate or inaccurate addition is found to be viable for practical applications which have an inherent error tolerance. Approximate addition is realized using an approximate adder, and many approximate adder designs have been put forward in the literature targeting an acceptable trade-off between quality of results and savings in design metrics compared to the accurate adder. Approximate adders can be classified into three categories as: (a) suitable for FPGA implementation, (b) suitable for ASIC type implementation, and (c) suitable for FPGA and ASIC type implementations. Among these, approximate adders, which are suitable for FPGA and ASIC type implementations are particularly interesting given their versatility and they are typically designed at the gate level. Depending on the way approximation is built into an approximate adder, approximate adders can be classified into two kinds as static approximate adders and dynamic approximate adders. This paper compares and analyzes static approximate adders which are suitable for both FPGA and ASIC type implementations. We consider many static approximate adders and evaluate their performance for a digital image processing application using standard figures of merit such as peak signal to noise ratio and structural similarity index metric. We provide the error metrics of approximate adders, and the design metrics of accurate and approximate adders corresponding to FPGA and ASIC type implementations. For the FPGA implementation, we considered a Xilinx Artix-7 FPGA, and for an ASIC type implementation, we considered a 32/28 nm CMOS standard digital cell library. While the inferences from this work could serve as a useful reference to determine an optimum static approximate adder for a practical application, in particular, we found approximate adders HOAANED, HERLOA and M-HERLOA to be preferable. Full article
Show Figures

Figure 1

Back to TopTop