Fault-Tolerant Digital Circuits: Protection Techniques, CAD Tools and Emerging Applications

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Circuit and Signal Processing".

Deadline for manuscript submissions: closed (31 March 2021) | Viewed by 21118

Special Issue Editor


E-Mail Website
Guest Editor
ARIES Research Center, Universidad Nebrija, 28040 Madrid, Spain
Interests: computer architecture; digital design; fault-tolerance; reliability; small satellites and space applications

Special Issue Information

Dear Colleagues,

Reliability is a growing concern for electronic digital circuits. This is especially relevant for applications working in harsh environments, e.g., space, in which radiation sources may induce several phenomena that can jeopardize the behavior of the system. However, in recent years, this issue has extended to other application fields working at the ground level, mainly because of the use of smaller geometries. In this case, devices become more sensitive, and errors sources are not limited to radiation, other causes being, e.g., the impurities in packaging or the use of reduced voltage levels in low power modes. In order to tackle this problem, fault-tolerant circuits are a must. Leaving aside rad-hard fabrication methods, protection by design is usually an interesting approach to achieve the desired reliability. From innovative uses of standard duplication and triplication techniques (as DMR and TMR) to ad-hoc reliable architectures, fault-tolerant design principles may be applied to a wide range of systems. These efforts are usually coupled with the protection of memories, using innovative error correction codes that increase the reliability of the applications. However, it is not only novel protection techniques that are necessary: Tools and procedures to rapidly test the behavior of the new reliable architectures are also needed. In this respect, the design and implementation of CAD tools that can emulate error injection in digital designs is another way to boost research on fault-tolerant circuits. Finally, the advent of reliable digital circuits is fostering the use of applications with a focus on fault tolerance in a variety of fields such as space, transport, biomedical, nuclear, and high energy physics. In this Special Issue, we are interested in exploring new protection techniques for electronic systems, CAD tools to perform reliability tests and novel applications, in different fields, which can benefit from the use of fault-tolerant circuits.

These topics include but not limited to:

  • New protection techniques for digital circuits
  • Fault-tolerant computer architectures
  • Error correction codes (ECC) to increase reliability in innovative ways
  • Signal and image processing systems with a focus on reliability
  • CAD tools and methods to emulate error injection test processes
  • Novel applications of fault tolerant digital circuits

Prof. Dr. Juan Antonio Maestro
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 5144 KiB  
Article
Prediction-Based Error Correction for GPU Reliability with Low Overhead
by Hyunyul Lim, Tae Hyun Kim and Sungho Kang
Electronics 2020, 9(11), 1849; https://doi.org/10.3390/electronics9111849 - 05 Nov 2020
Cited by 2 | Viewed by 2229
Abstract
Scientific and simulation applications are continuously gaining importance in many fields of research and industries. These applications require massive amounts of memory and substantial arithmetic computation. Therefore, general-purpose computing on graphics processing units (GPGPU), which combines the computing power of graphics processing units [...] Read more.
Scientific and simulation applications are continuously gaining importance in many fields of research and industries. These applications require massive amounts of memory and substantial arithmetic computation. Therefore, general-purpose computing on graphics processing units (GPGPU), which combines the computing power of graphics processing units (GPUs) and general CPUs, have been used for computationally intensive scientific and big data processing applications. Because current GPU architectures lack hardware support for error detection in computation logic, GPGPU has low reliability. Unlike graphics applications, errors in GPGPU can lead to serious problems in general-purpose computing applications. These applications are often intertwined with human life, meaning that errors can be life threatening. Therefore, this paper proposes a novel prediction-based error correction method called Prediction-based Error Correction (PRECOR) for GPU reliability, which detects and corrects errors in GPGPU platforms with a focus on errors in computational elements. The implementation of the proposed architecture needs a small number of checkpoint buffers in order to fix errors in computational logic. The PRECOR architecture has prediction buffers and controller units for predicting erroneous outputs before performing rollback. Following a rollback, the architecture confirms the accuracy of its predictions. The proposed method effectively reduces the hardware and time overheads required to correct errors. Experimental results confirm that PRECOR efficiently fixes errors with low hardware and time overheads. Full article
Show Figures

Figure 1

14 pages, 2771 KiB  
Article
A Single Error Correcting Code with One-Step Group Partitioned Decoding Based on Shared Majority-Vote
by Abhishek Das and Nur A. Touba
Electronics 2020, 9(5), 709; https://doi.org/10.3390/electronics9050709 - 26 Apr 2020
Cited by 1 | Viewed by 9850
Abstract
Technology scaling has led to an increase in density and capacity of on-chip caches. This has enabled higher throughput by enabling more low latency memory transfers. With the reduction in size of SRAMs and development of emerging technologies, e.g., STT-MRAM, for on-chip cache [...] Read more.
Technology scaling has led to an increase in density and capacity of on-chip caches. This has enabled higher throughput by enabling more low latency memory transfers. With the reduction in size of SRAMs and development of emerging technologies, e.g., STT-MRAM, for on-chip cache memories, reliability of such memories becomes a major concern. Traditional error correcting codes, e.g., Hamming codes and orthogonal Latin square codes, either suffer from high decoding latency, which leads to lower overall throughput, or high memory overhead. In this paper, a new single error correcting code based on a shared majority voting logic is presented. The proposed codes trade off decoding latency in order to improve the memory overhead posed by orthogonal Latin square codes. A latency optimization technique is also proposed which lowers the decoding latency by incurring a slight memory overhead. It is shown that the proposed codes achieve better redundancy compared to orthogonal Latin square codes. The proposed codes are also shown to achieve lower decoding latency compared to Hamming codes. Thus, the proposed codes achieve a balanced trade-off between memory overhead and decoding latency, which makes them highly suitable for on-chip cache memories which have stringent throughput and memory overhead constraints. Full article
Show Figures

Figure 1

9 pages, 941 KiB  
Article
Dual-Mode FPGA-Based Triple-TDC With Real-Time Calibration and a Triple Modular Redundancy Scheme
by Yuan-Ho Chen
Electronics 2020, 9(4), 607; https://doi.org/10.3390/electronics9040607 - 03 Apr 2020
Viewed by 2694
Abstract
This paper proposes a triple time-to-digital converter (TDC) for a field-programmable gate array (FPGA) platform with dual operation modes. First, the proposed triple-TDC employs the real-time calibration circuit followed by the traditional tapped delay line architecture to improve the environmental effect for the [...] Read more.
This paper proposes a triple time-to-digital converter (TDC) for a field-programmable gate array (FPGA) platform with dual operation modes. First, the proposed triple-TDC employs the real-time calibration circuit followed by the traditional tapped delay line architecture to improve the environmental effect for the application of multiple TDCs. Second, the triple modular redundancy scheme is used to deal with the uncertainty in the FPGA device for improving the linearity for the application of a single TDC. The proposed triple-TDC is implemented in a Xilinx Virtex-5 FPGA platform and has a time resolution of 40 ps root mean square for multi-mode operation. Moreover, the ranges of differential nonlinearity and integral nonlinearity can be improved by 56 % and 37 % , respectively, for single-mode operation. Full article
Show Figures

Figure 1

12 pages, 799 KiB  
Article
Analysis of the Critical Bits of a RISC-V Processor Implemented in an SRAM-Based FPGA for Space Applications
by Luis Alberto Aranda, Nils-Johan Wessman, Lucana Santos, Alfonso Sánchez-Macián, Jan Andersson, Roland Weigand and Juan Antonio Maestro
Electronics 2020, 9(1), 175; https://doi.org/10.3390/electronics9010175 - 17 Jan 2020
Cited by 24 | Viewed by 3720
Abstract
One of the traditional issues in space missions is the reliability of the electronic components on board spacecraft. There are numerous techniques to deal with this, from shielding and rad-hard fabrication to ad-hoc fault-tolerant designs. Although many of these solutions have been extensively [...] Read more.
One of the traditional issues in space missions is the reliability of the electronic components on board spacecraft. There are numerous techniques to deal with this, from shielding and rad-hard fabrication to ad-hoc fault-tolerant designs. Although many of these solutions have been extensively studied, the recent utilization of FPGAs as the target architecture for many electronic components has opened new possibilities, partly due to the distinct nature of these devices. In this study, we performed fault injection experiments to determine if a RISC-V soft processor implemented in an FPGA could be used as an onboard computer for space applications, and how the specific nature of FPGAs needs to be tackled differently from how ASICs have been traditionally handled. In particular, in this paper, the classic definition of the cross-section is revisited, putting into perspective the importance of the so-called “critical bits” in an FPGA design. Full article
Show Figures

Figure 1

48 pages, 4326 KiB  
Article
Adaptive-Hybrid Redundancy with Error Injection
by Nicolas Hamilton, Scott Graham, Timothy Carbino, James Petrosky and Addison Betances
Electronics 2019, 8(11), 1266; https://doi.org/10.3390/electronics8111266 - 01 Nov 2019
Cited by 2 | Viewed by 2007
Abstract
Adaptive-Hybrid Redundancy (AHR) shows promise as a method to allow flexibility when selecting between processing speed and energy efficiency while maintaining a level of error mitigation in space radiation environments. Whereas previous work demonstrated AHR’s feasibility in an error free environment, this work [...] Read more.
Adaptive-Hybrid Redundancy (AHR) shows promise as a method to allow flexibility when selecting between processing speed and energy efficiency while maintaining a level of error mitigation in space radiation environments. Whereas previous work demonstrated AHR’s feasibility in an error free environment, this work analyzes AHR performance in the presence of errors. Errors are deliberately injected into AHR at specific times in the processing chain to demonstrate best and worst case performance impacts. This analysis demonstrates that AHR provides flexibility in processing speed and energy efficiency in the presence of errors. Full article
Show Figures

Figure 1

Back to TopTop