VLSI Architecture Design for Digital Signal Processing

A special issue of Electronics (ISSN 2079-9292).

Deadline for manuscript submissions: closed (31 May 2019) | Viewed by 42537

Special Issue Editor


E-Mail Website
Guest Editor
Department of Information and Communication Engineering, Inha University, Incheon 22212, Republic of Korea
Interests: VLSI architectures for DSP; forward error correction architectures; hardware cryptographic architectures; artificial intelligent HW design
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Digital signal processing (DSP) is an enabling technology for many applications such as video, speech, wired and wireless communications, and multimedia.  With the advent of high levels of integration on a single silicon substrate, a new generation of integrated circuits has been developed that is directly applicable to perform DSP functions. VLSI architecture design of DSP focuses on designing methodologies for the realization of dedicated VLSI systems for signal processing and communication applications. This will also include architectures for big data computing and machine learning. The VLSI architecture design methodologies will be used for exploring area–power–speed tradeoffs for different DSP applications. Apart from achieving the capacity to process high-speed data, the minimization of area and power consumption is another important constraint in VLSI implementations. 

The main aim of this Special Issue is to seek high-quality submissions that highlight emerging applications and address recent breakthroughs in the VLSI architecture design for DSP, including design and analysis of signal processing algorithms and architecture, performance analysis of signal processing systems, VLSI design methodology, design of arithmetic circuits and VLSI components used in signal processing.

The topics of interest include, but are not limited to:

  • Design and implementation of signal processing systems
  • Machine learning architectures for DSP
  • Circuits and systems for signal processing and communications
  • Cryptography architectures and hardware security
  • Forward error correction architectures
  • Multimedia signal processing systems
  • Adaptive digital processing systems with FPGA components
  • VLSI signal processing architectures
  • Special purpose signal processing architectures
  • SoC designs for DSP
  • DSP algorithms implemented in VLSI systems
  • Embedded architectures and systems

Prof. Dr. Hanho Lee
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 942 KiB  
Article
High-Throughput and Low-Latency Digital Baseband Architecture for Energy-Efficient Wireless VR Systems
by Seokha Hwang, Seungsik Moon, Dongyun Kam, Inn-Yeal Oh and Youngjoo Lee
Electronics 2019, 8(7), 815; https://doi.org/10.3390/electronics8070815 - 22 Jul 2019
Cited by 4 | Viewed by 4675
Abstract
This paper presents a novel baseband architecture that supports high-speed wireless VR solutions using 60 GHz RF circuits. Based on the experimental observations by our previous 60 GHz transceiver circuits, the efficient baseband architecture is proposed to enhance the quality of transmission. To [...] Read more.
This paper presents a novel baseband architecture that supports high-speed wireless VR solutions using 60 GHz RF circuits. Based on the experimental observations by our previous 60 GHz transceiver circuits, the efficient baseband architecture is proposed to enhance the quality of transmission. To achieve a zero-latency transmission, we define an (106,920, 95,040) interleaved-BCH error-correction code (ECC), which removes iterative processing steps in the previous LDPC ECC standardized for the near-field wireless communication. Introducing the block-level interleaving, the proposed baseband processing successfully scatters the existing burst errors to the small-sized component codes, and recovers up to 1080 consecutive bit errors in a data frame of 106,920 bits. To support the high-speed wireless VR system, we also design the massive-parallel BCH encoder and decoder, which is tightly connected to the block-level interleaver and de-interleaver. Including the high-speed analog interfaces for the external devices, the proposed baseband architecture is designed in 65 nm CMOS, supporting a data rate of up to 12.8 Gbps. Experimental results show that the proposed wireless VR solution can transfer up to 4 K high-resolution video streams without using time-consuming compression and decompression, successfully achieving a transfer latency of 1 ms. Full article
(This article belongs to the Special Issue VLSI Architecture Design for Digital Signal Processing)
Show Figures

Figure 1

15 pages, 961 KiB  
Article
Efficient QC-LDPC Encoder for 5G New Radio
by Tram Thi Bao Nguyen, Tuy Nguyen Tan and Hanho Lee
Electronics 2019, 8(6), 668; https://doi.org/10.3390/electronics8060668 - 13 Jun 2019
Cited by 51 | Viewed by 8771
Abstract
This paper presents a novel efficient encoding method and a high-throughput low-complexity encoder architecture for quasi-cyclic low-density parity-check (QC-LDPC) codes for the 5th-generation (5G) New Radio (NR) standard. By storing the quantized value of the permutation information for each submatrix instead of the [...] Read more.
This paper presents a novel efficient encoding method and a high-throughput low-complexity encoder architecture for quasi-cyclic low-density parity-check (QC-LDPC) codes for the 5th-generation (5G) New Radio (NR) standard. By storing the quantized value of the permutation information for each submatrix instead of the whole parity check matrix, the required memory storage size is considerably reduced. In addition, sharing techniques are employed to reduce the hardware complexity. The encoding complexity of the proposed method was analyzed, and indicated a substantial reduction in the required area as well as memory storage when compared with existing state-of-the-art encoding approaches. The proposed method requires only 61% gate area, and 11% ROM storage when compared with a similar LDPC encoder using the Richardson–Urbanke method. Synthesis results on TSMC 65-nm complementary metal-oxide semiconductor (CMOS) technology with different submatrix sizes were carried out, which confirmed that the design methodology is flexible and can be adapted for multiple submatrix sizes. For all the considered submatrix sizes, the throughput ranged from 22.1–202.4 Gbps, which sufficiently meets the throughput requirement for the 5G NR standard. Full article
(This article belongs to the Special Issue VLSI Architecture Design for Digital Signal Processing)
Show Figures

Figure 1

13 pages, 1904 KiB  
Article
VLSI Implementation of Restricted Coulomb Energy Neural Network with Improved Learning Scheme
by Jaechan Cho, Yongchul Jung, Seongjoo Lee and Yunho Jung
Electronics 2019, 8(5), 563; https://doi.org/10.3390/electronics8050563 - 22 May 2019
Cited by 5 | Viewed by 4000
Abstract
This paper proposes a restricted coulomb energy neural network (RCE-NN) with an improved learning algorithm and presents the hardware architecture design and VLSI implementation results. The learning algorithm of the existing RCE-NN applies an inefficient radius adjustment, such as learning all neurons at [...] Read more.
This paper proposes a restricted coulomb energy neural network (RCE-NN) with an improved learning algorithm and presents the hardware architecture design and VLSI implementation results. The learning algorithm of the existing RCE-NN applies an inefficient radius adjustment, such as learning all neurons at the same radius or reducing the radius excessively in the learning process. Moreover, since the reliability of eliminating unnecessary neurons is estimated without considering the activation region of each neuron, it is inaccurate and leaves unnecessary neurons extant. To overcome this problem, the proposed learning algorithm divides each neuron region in the learning process and measures the reliability with different factors for each region. In addition, it applies a process of gradual radius reduction by a pre-defined reduction rate. In performance evaluations using two datasets, RCE-NN with the proposed learning algorithm showed high recognition accuracy with fewer neurons compared to existing RCE-NNs. The proposed RCE-NN processor was implemented with 197.8K logic gates in 0.535 mm 2 using a 55 nm CMOS process and operated at the clock frequency of 150 MHz. Full article
(This article belongs to the Special Issue VLSI Architecture Design for Digital Signal Processing)
Show Figures

Figure 1

16 pages, 910 KiB  
Article
An Efficient Hardware Accelerator for the MUSIC Algorithm
by Hui Chen, Kai Chen, Kaifeng Cheng, Qinyu Chen, Yuxiang Fu and Li Li
Electronics 2019, 8(5), 511; https://doi.org/10.3390/electronics8050511 - 08 May 2019
Cited by 10 | Viewed by 3755
Abstract
As a classical DOA (direction of arrival) estimation algorithm, the multiple signal classification (MUSIC) algorithm can estimate the direction of signal incidence. A major bottleneck in the application of this algorithm is the large computation amount, so accelerating the algorithm to meet the [...] Read more.
As a classical DOA (direction of arrival) estimation algorithm, the multiple signal classification (MUSIC) algorithm can estimate the direction of signal incidence. A major bottleneck in the application of this algorithm is the large computation amount, so accelerating the algorithm to meet the requirements of high real-time and high precision is the focus. In this paper, we design an efficient and reconfigurable accelerator to implement the MUSIC algorithm. Initially, we propose a hardware-friendly MUSIC algorithm without the eigenstructure decomposition of the covariance matrix, which is time consuming and accounts for about 60% of the whole computation. Furthermore, to reduce the computation of the covariance matrix, this paper utilizes the conjugate symmetry property of it and the way of iterative storage, which can also lessen memory access time. Finally, we adopt the stepwise search method to realize the spectral peak search, which can meet the requirements of 1° and 0.1° precision. The accelerator can operate at a maximum frequency of 1 GHz with a 4,765,475.4 μm2 area, and the power dissipation is 238.27 mW after the gate-level synthesis under the TSMC 40-nm CMOS technology with the Synopsys Design Compiler. Our implementation can accelerate the algorithm to meet the high real-time and high precision requirements in applications. Assuming that the case is an eight-element uniform linear array, a single signal source, and 128 snapshots, the computation times of the algorithm in our architecture are 2.8 μs and 22.7 μs for covariance matrix estimation and spectral peak search, respectively. Full article
(This article belongs to the Special Issue VLSI Architecture Design for Digital Signal Processing)
Show Figures

Figure 1

14 pages, 1241 KiB  
Article
Some Structures of Parallel VLSI-Oriented Processing Units for Implementation of Small Size Discrete Fractional Fourier Transforms
by Aleksandr Cariow, Janusz Papliński and Dorota Majorkowska-Mech
Electronics 2019, 8(5), 509; https://doi.org/10.3390/electronics8050509 - 08 May 2019
Cited by 8 | Viewed by 2376
Abstract
Discrete orthogonal transforms such as the discrete Fourier transform, discrete cosine transform, discrete Hartley transform, etc., are important tools in numerical analysis, signal processing, and statistical methods. The successful application of transform techniques relies on the existence of efficient fast algorithms for their [...] Read more.
Discrete orthogonal transforms such as the discrete Fourier transform, discrete cosine transform, discrete Hartley transform, etc., are important tools in numerical analysis, signal processing, and statistical methods. The successful application of transform techniques relies on the existence of efficient fast algorithms for their implementation. A special place in the list of transformations is occupied by the discrete fractional Fourier transform (DFrFT). In this paper, some parallel algorithms and processing unit structures for fast DFrFT implementation are proposed. The approach is based on the resourceful factorization of DFrFT matrices. Some parallel algorithms and processing unit structures for small size DFrFTs such as N = 2, 3, 4, 5, 6, and 7 are presented. In each case, we describe only the most important part of the structures of the processing units, neglecting the description of the auxiliary units and the control circuits. Full article
(This article belongs to the Special Issue VLSI Architecture Design for Digital Signal Processing)
Show Figures

Figure 1

13 pages, 2095 KiB  
Article
Efficient-Scheduling Parallel Multiplier-Based Ring-LWE Cryptoprocessors
by Tuy Nguyen Tan and Hanho Lee
Electronics 2019, 8(4), 413; https://doi.org/10.3390/electronics8040413 - 09 Apr 2019
Cited by 9 | Viewed by 4076
Abstract
This paper presents a novel architecture for ring learning with errors (LWE) cryptoprocessors using an efficient approach in encryption and decryption operations. By scheduling multipliers to work in parallel, the encryption and decryption time are significantly reduced. In addition, polynomial multiplications are conducted [...] Read more.
This paper presents a novel architecture for ring learning with errors (LWE) cryptoprocessors using an efficient approach in encryption and decryption operations. By scheduling multipliers to work in parallel, the encryption and decryption time are significantly reduced. In addition, polynomial multiplications are conducted using radix-2 and radix-8 multiple delay feedback (MDF) architecture-based number theoretic transform (NTT) multipliers to speed up the multiplication operation. To reduce the hardware complexity of an NTT multiplier, three bit-reverse operations during the NTT and inverse NTT (INTT) processes are removed. Polynomial additions in the ring-LWE encryption phase are also arranged to work simultaneously to reduce the latency. As a result, the proposed efficient-scheduling parallel multiplier-based ring-LWE cryptoprocessors can achieve higher throughput and efficiency compared with existing architectures. The proposed ring-LWE cryptoprocessors are synthesized and verified using Xilinx VIVADO on a Virtex-7 field programmable gate array (FPGA) board. With security parameters n = 512 and q = 12,289, the proposed cryptoprocessors using radix-2 single-path delay feedback (SDF), radix-2 MDF, and radix-8 MDF multipliers perform encryption in 4.58 μ s, 1.97 μ s, and 0.89 μ s, and decryption in 4.35 μ s, 1.82 μ s, and 0.71 μ s, respectively. A comparison of the obtained throughput and efficiency with those of previous studies proves that the proposed cryptoprocessors achieve a better performance. Full article
(This article belongs to the Special Issue VLSI Architecture Design for Digital Signal Processing)
Show Figures

Figure 1

13 pages, 1275 KiB  
Article
Design of Cascaded CORDIC Based on Precise Analysis of Critical Path
by Pramod Kumar Meher and Sang Yoon Park
Electronics 2019, 8(4), 382; https://doi.org/10.3390/electronics8040382 - 29 Mar 2019
Cited by 14 | Viewed by 3945
Abstract
A conventional coordinate rotation digital computer (CORDIC) has a low throughput rate due to its recursive implementation of micro-rotations. On the contrary, a fully-pipelined cascaded CORDIC provides a very high throughput rate at the cost of high complexity and large area. In this [...] Read more.
A conventional coordinate rotation digital computer (CORDIC) has a low throughput rate due to its recursive implementation of micro-rotations. On the contrary, a fully-pipelined cascaded CORDIC provides a very high throughput rate at the cost of high complexity and large area. In this paper, possible design choices of cascaded CORDIC are explored over a wide range of operating frequencies, throughput rates, latency, and area complexity. For this purpose, we present a fine-grained critical path analysis of the cascaded CORDIC in terms of bit-level delay. Based on the propagation delay estimate, we propose an algorithm for determining the required number of pipeline stages and locations of the pipeline registers in order to meet the time constraint in a particular application. A hybrid cascaded-recursive CORDIC is also proposed to increase the throughput rate, and to reduce the latency and energy per sample (EPS). From synthesis results, we show that the proposed pipelined cascaded CORDIC with only four pipeline stages requires 31.1% less area and 29.0% less EPS compared to a fully-pipelined CORDIC. An eight stage pipelined recursive cascaded CORDIC provides 18.3% less EPS and 40.4% less area-delay product than a conventional CORDIC. Full article
(This article belongs to the Special Issue VLSI Architecture Design for Digital Signal Processing)
Show Figures

Figure 1

12 pages, 5598 KiB  
Article
A Variation-Aware Design Methodology for Distributed Arithmetic
by Yue Lu, Shengyu Duan, Basel Halak and Tom Kazmierski
Electronics 2019, 8(1), 108; https://doi.org/10.3390/electronics8010108 - 18 Jan 2019
Cited by 2 | Viewed by 3923
Abstract
Distributed arithmetic (DA) brings area and power benefits to digital designs relevant to the Internet-of-Things. Therefore, new error resilient techniques for DA computation are urgently required to improve robustness against the process, voltage, and temperature (PVT) variations. This paper proposes a new in-situ [...] Read more.
Distributed arithmetic (DA) brings area and power benefits to digital designs relevant to the Internet-of-Things. Therefore, new error resilient techniques for DA computation are urgently required to improve robustness against the process, voltage, and temperature (PVT) variations. This paper proposes a new in-situ timing error prevention technique to mitigate the impact of variations in DA circuits by providing a guardband for significant (most significant bit) computations. This guardband is initially achieved by modifying the sign extension block and carefully gate-sizing. Therefore, least significant bit (LSB) computation can correspond to the critical path, and timing error can be tolerated at the cost of acceptable accuracy loss. Our approach is demonstrated on a 16-tap finite impulse respons (FIR) filter using the 65 nm CMOS process and the simulation results show that this design can still maintain high-accuracy performance without worst case timing margin, and achieve up to 32 % power savings by voltage scaling when the worst case margin is considered with only 9 % area overhead. Full article
(This article belongs to the Special Issue VLSI Architecture Design for Digital Signal Processing)
Show Figures

Figure 1

13 pages, 829 KiB  
Article
Soft-Decision Low-Complexity Chase Decoders for the RS(255,239) Code
by Vicente Torres, Javier Valls, Maria Jose Canet and Francisco García-Herrero
Electronics 2019, 8(1), 10; https://doi.org/10.3390/electronics8010010 - 21 Dec 2018
Cited by 5 | Viewed by 3679
Abstract
In this work, we present a new architecture for soft-decision Reed–Solomon (RS) Low-Complexity Chase (LCC) decoding. The proposed architecture is scalable and can be used for a high number of test vectors. We propose a novel Multiplicity Assignment stage that sorts and stores [...] Read more.
In this work, we present a new architecture for soft-decision Reed–Solomon (RS) Low-Complexity Chase (LCC) decoding. The proposed architecture is scalable and can be used for a high number of test vectors. We propose a novel Multiplicity Assignment stage that sorts and stores only the location of the errors inside the symbols and the powers of α that identify the positions of the symbols in the frame. Novel schematics for the Syndrome Update and Symbol Modification blocks that are adapted to the proposed sorting stage are also presented. We also propose novel solutions for the problems that arise when a high number of test vectors is processed. We implemented three decoders: a η = 4 LCC decoder and two decoders that only decode 31 and 60 test vectors of true η = 5 and η = 6 LCC decoders, respectively. For example, our η = 4 decoder requires 29% less look-up tables in Virtex-V Field Programmable Gate Array (FPGA) devices than the best soft-decision RS decoder published to date, while has a 0.07 dB coding gain over that decoder. Full article
(This article belongs to the Special Issue VLSI Architecture Design for Digital Signal Processing)
Show Figures

Graphical abstract

Back to TopTop