Next Article in Journal
Convolutional Neural Network for Crowd Counting on Metro Platforms
Next Article in Special Issue
Switching Order after Failures in Symmetric Protective Electrical Circuits with Triple Modal Reservation
Previous Article in Journal
A Statistical Analysis of Observational Hubble Parameter Data to Discuss the Cosmology of Holographic Chaplygin Gas
Previous Article in Special Issue
Weight Queue Dynamic Active Queue Management Algorithm
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

High Area-Efficient Parallel Encoder with Compatible Architecture for 5G LDPC Codes

School of Computer, National University of Defense Technology, Changsha 410073, China
*
Author to whom correspondence should be addressed.
Symmetry 2021, 13(4), 700; https://doi.org/10.3390/sym13040700
Submission received: 5 March 2021 / Revised: 9 April 2021 / Accepted: 13 April 2021 / Published: 16 April 2021
(This article belongs to the Special Issue Information Technologies and Electronics Ⅱ)

Abstract

:
This paper presents a novel parallel quasi-cyclic low-density parity-check (QC-LDPC) encoding algorithm with low complexity, which is compatible with the 5th generation (5G) new radio (NR). Basing on the algorithm, we propose a high area-efficient parallel encoder with compatible architecture. The proposed encoder has the advantages of parallel encoding and pipelined operations. Furthermore, it is designed as a configurable encoding structure, which is fully compatible with different base graphs of 5G LDPC. Thus, the encoder architecture has flexible adaptability for various 5G LDPC codes. The proposed encoder was synthesized in a 65 nm CMOS technology. According to the encoder architecture, we implemented nine encoders for distributed lifting sizes of two base graphs. The eperimental results show that the encoder has high performance and significant area-efficiency, which is better than related prior art. This work includes a whole set of encoding algorithm and the compatible encoders, which are fully compatible with different base graphs of 5G LDPC codes. Therefore, it has more flexible adaptability for various 5G application scenarios.

1. Introduction

Low-density parity-check (LDPC) codes have been recognized for their excellent error correction abilities near the Shannon limit [1], and LDPC codes are advantageous in hardware implementation [2]. At present, many communication systems have taken these codes as standards, such as the Digital Video Broadcasting Satellite (DVB-S2/S2X, Europe) [3], the Consultative Committee for Space Data Systems (CCSDS) [4], Wireless Local Area Network (WLAN, IEEE 802.11n) [5], the China digital radio standard [6], and the 5th Generation Mobile Communication Technology (5G) [7,8].
Currently, research on mobile communication systems has entered the 5G phase [9]. Channel encoding is one of the 5G core technologies; it is mainly used to ensure the correct transmission of channel information and to improve communication quality [10]. The Third Generation Partnership Project (3GPP) organization has finally decided to take the LDPC code as the data channel coding scheme for 5G enhanced Mobile Broadband (eMBB) [11,12]. Compared to 4G, 5G has higher requirements in terms of the data transmission rate and information transmission reliability [13,14]. Therefore, it has important significance and application value for exploring a novel LDPC encoding scheme and implementing 5G LDPC codes [15,16]. To achieve scalable data transmission and flexibility, 3GPP has decided to take two kinds of base graphs for 5G channel encoding—BG1 and BG2 [17].
Presently, some research initiatives have focused on 5G LDPC codes [18,19]. 5G New Radio (NR) has higher performance demands on channel coding solutions [20]; one study discusses the design concept of the new quasi-cyclic low-density parity-check (QC-LDPC) codes that have different structural characteristics and meet the multiple requirements of 5G NR channel coding [21]. Designed as structured LDPC codes, QC-LDPC codes have been research hotspots in the recent past. QC-LDPC codes possess obvious advantages in circuit implementations, and compared to other kinds of LDPC codes, QC-LDPC codes need fewer hardware resources [22].
Owing to the sparsity of the parity-check matrix, the QC-LDPC encoder can be achieved with a low-complexity design. Despite LDPC codes being defined by a parity check matrix, it is difficult to directly realize a low-complexity LDPC encoder as their generator matrix is usually unknown. Some studies have been conducted to acquire low-complexity LDPC encoding. The direct encoding algorithm was originally proposed by Dr. Gallager [23]. It is a general coding algorithm for linear block codes. The method directly uses C = S × G matrix to obtain the code word C (S represents information bits); its theory is simple, but its coding process is complex. The algorithm utilizes Gaussian elimination to transform the check matrix (H matrix) into a generator matrix (G matrix). Its computation amount and the algorithm complexity are high, and the sparsity of the H matrix is damaged in the process. The algorithm needs to store G matrix information in hardware circuits. The consumption of hardware resources is substantially large, so its hardware implementation is difficult. The LU algorithm [24] uses the H matrix to encode directly; it does not need to convert the H matrix into a G matrix, and the H matrix is split into an information bit matrix Hs and a check bit matrix HP. However, this method requires LU decomposition on the HP matrix, that is, H left-multiplies a permutation matrix A, and the determination of A is difficult, so the encoding algorithm is still complex. When the H matrix is a singular matrix, it cannot realize LU decomposition, and the LU algorithm cannot be accomplished. The algorithm is also unsuitable for hardware implementation. The approximate lower triangular matrix encoding algorithm is an effective encoding method named as the RU algorithm [25], which directly applies the H matrix for encoding. The encoding complexity of the RU algorithm is lower than the direct encoding algorithm. The RU algorithm converts the check matrix H into an approximate lower triangular matrix by row and column permutation under the condition of known information bits. The check bits are then obtained by using check equations. Finally, the information bits and check bits are connected in series to form the final codeword. A main disadvantage of the RU coding process is that there is no precise programmable step-by-step encoding algorithm. The multiple matrix computations within the RU algorithm obviously limit the design of a fast flexible encoder.
In 5G communication, in order to meet the needs of diverse communication scenarios, the 3GPP organization has considered the compatibility requirements for the characteristics of different scenarios when formulating the standards for 5G LDPC codes. The 5G LDPC standards contain two different base graphs, BG1 and BG2, which correspond to two different base matrices, HBG1 and HBG2. In addition, each base matrix has two sub-matrices B (corresponding to core parity bits), that is, the base matrices of 5G LDPC codes can be further divided into four base matrices.
Although Reference [26] introduces a QC-LDPC encoding structure, its encoding scheme only considers the case of one submatrix B in a single base graph. The paper does not research the significant requirements of encoding compatibility for 5G LDPC codes, which cannot meet the practical application conditions in different scenarios. Reference [6] introduces the encoding approach of CDR LDPC codes. Combining the characteristics of the generation matrix and the check matrix, it designs a hardware-friendly encoding method. In addition, it adopts an optimized control and storage design in implementing the four LDPC codes specified by CDR standard. Reference [22] introduces hardware architectures for encoding QC-LDPC codes, which is based on the features of recursively-constructed QC-LDPC codes. It takes LU decomposition, involved matrices need to be precomputed, compressed and stored in encoding memories. Reference [27] proposes two encoding architectures which can support several code lengths for different applications. The design can realize the requirements for different encoding parameters. Reference [28] proposes a fully parallel LDPC encoder based on reduced complexity XOR trees; it is designed for the IEEE 802.11n standards. Reference [29] introduces a method to improve hardware multiplication based on constant matrices in GF(2); it tries to apply the method to the QC-LDPC encoding algorithm. Reference [30] describes that the throughput of QC-LDPC codes could be improved by trimming the full-base matrix into the requested matrix size.
For high performance and compatibility of 5G LDPC encoding requirements, this work presents a highly area-efficient parallel QC-LDPC encoder core with compatible architecture, which is compatible with the latest 5G standard. It has high encoding performance and a low hardware cost.
The remaining sections of this paper are organized as follows: Section 2 briefly analyzes the characteristics of 5G LDPC codes. Section 3 proposes a high parallel LDPC encoding algorithm compatible with 5G LDPC codes. Section 4 shows a high area-efficient parallel QC-LDPC encoder with compatible architecture. Section 5 gives experimental results and comparative analysis. Section 6 summarizes our work and provides the conclusions.

2. Analysis of 5G LDPC Codes

LDPC codes are adopted as the data encoding scheme of 5G because its higher encoding throughput and lower latency can better adapt to the data transmission of high-speed services. The main content of 5G LDPC standards is analyzed progressively as follows.
The LDPC codes in 5G standards are QC-LDPC codes. For one QC-LDPC code, the structural characteristics of the check matrix can be denoted by a base graph (BG) or a base graph matrix (HBG), as exampled by the check matrix in Equation (1).
H = [ 0 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 ]
HBG is a base graph matrix associated with the check matrix H.
H B G = [ 1 1 1 0 0 1 1 1 1 1 0 1 ]
In the matrix above, each 1 in HBG represents a 4 × 4 binary circulant permutation matrix (CPM), and each 0 represents a 4 × 4 zero matrix; that is, the size Z of the matrix in Equation (1) is 4. The HBG of each QC-LDPC code intuitively indicates the position of the CPM in the check matrix by the distributions of elements ‘1’, providing an important reference in the encoder design of corresponding codes.
HBG can represent the structural characteristics of the check matrix of one QC-LDPC code. However, it cannot reflect the cyclic shift value of each CPM. Therefore, it is essential to define a cyclic shift coefficient matrix P (exponent matrix) to represent the cyclic shift value of the corresponding base graph matrix.
P = [ P 1 , 1 P 1 , 2 P 1 , n P 2 , 1 P 2 , 2 P 2 , n P m , 1 P m , 2 P m , n ]
There are two values of Pm,n. When 0 ≤ Pm,n < Z, it denotes the cyclic permutation matrix obtained with a Z × Z submatrix right-shifting by Pm,n bits. When Pm,n = −1, it denotes a Z × Z zero matrix.
Equation (1) corresponds to the cyclic shift coefficient matrix P, which is shown as follows:
P = [ 1 0 2 1 1 2 1 0 0 1 1 2 ]
Therefore, the check matrix is unique if the lifting size Z and the exponent matrix P of one QC-LDPC code are determined. The description of 5G LDPC codes often adopts this representation method.
In the 5G standard, LDPC codes have two types of base graphs, named BG1 and BG2. Check matrices of BG1 and BG2 both have the characteristic structure shown as Figure 1. These check matrices are termed H matrices.
The 5G LDPC code has two types of base matrices, namely, HBG1 and HBG2. Their information comparison is shown in Table 1.
BG1 has a total of 316 elements 1 while BG2 has a total of 197 elements 1. The element 1 indicates that the corresponding submatrix is an identity matrix or a cyclic right shift identity matrix. The element 0 indicates that the corresponding submatrix is a zero matrix. Cyclic shift coefficients of submatrices in H are stored in the corresponding coefficient matrix. The coefficient matrix has the same size as the HBG matrix. In the cyclic shift coefficient matrix, the non-negative element i corresponds to the element 1 in the HBG, indicating that the submatrix is the matrix obtained after an identity matrix is cyclically right shifting by i bits. The element −1 corresponds to the element 0 in the HBG, indicating that the corresponding submatrix is a zero matrix.
The size Z of the submatrix in the H is not fixed. The 5G standards specify the values of Z. For BG1 and BG2, the value ranges of Z are the same, as shown in Table 2.
In 5G standards, the cyclic shift coefficients of the submatrices vary in different situations. First, BG1 and BG2 have different coefficient matrices; furthermore, different values of ‘a’ in Table 2 will result in diverse coefficient matrices even in the same base matrix.
As shown in Table 2, each row of the size Z has the same cyclic shift coefficient matrix, so the size Z can be divided into 8 sets, each of which shares a common coefficient matrix. The Z values after division are shown in Table 3.
When a Z value in one set is taken as the size of each submatrix, the coefficient matrix corresponding to the set can be taken to represent the entire H matrix. Equation (5) indicates the final cycle shift coefficients corresponding to different Z values in the same set:
P i j = { 1 i f V i j = 1 m o d ( V i j , Z ) e l s e
where Vij denotes the element in the i-th row and the j-th column of the coefficient matrix corresponding to one set. Pij denotes the actual cyclic shift coefficient of the submatrix corresponding to the elements in the i-th row and the j-th column of the HBG for a selected Z in one set.

3. Parallel LDPC Encoding Algorithm Compatible with 5G LDPC Codes

This paper proposes a high parallel QC-LDPC encoding algorithm, which is compatible with 5G LDPC standards. Based on this algorithm, a novel encoder architecture for LDPC codes is designed to satisfy the requirements of 5G LDPC codes mentioned above. There are two base graphs for 5G LDPC codes, BG1 and BG2. These base graphs have different structures, as shown in Figure 2 and Figure 3. Our research is compatible with both BG1 and BG2, this work has wide applicability to the new 5G LDPC standards. Furthermore, we present an integrated solution of the parallel LDPC encoding algorithm and the area-efficient compatible encoder architecture.
As shown in the two figures, the check matrix H is divided into multiple sub-blocks. BG1 and BG2 have different base graphs. BG1 is mainly used for high-performance encoding scenarios. Taking BG1 as an example, the size of HBG1 is 46 × 68. For the sub-blocks A, B, O, C, D and I, their sizes are 4 × 22, 4 × 4, 4 × 42, 42 × 22, 42 × 4, 42 × 42. Since the lifting size of HBG is Z, the codeword C is uniformly divided by the size Z to match the base matrix HBG. According to the sub-block structure of HBG, C can be denoted as C = [S1,…, Skb, Pa1,…, Pa4, Pb1,…, Pb(Mb−4)]. The column number of information sequence S is as same as the column numbers of block A and block C. The column number of the check sequence Pa is as same as those of block B and block D. The column number of the check sequence Pb is as same as those of block O and block I.
The encoding of 5G LDPC codes is defined by H × CT = OT, which can be expressed as the following.
[ A B O C D I ] · [ S T P a T P b T ] = O
(1) First, Equation (6) is decomposed into the following equation set.
{ A · S T + B · P a T = 0 C · S T + D · P a T + I · P b T = 0
(2) Then, Pa and Pb are calculated as follows.
{ P a T = B 1 · ( A · S T ) P b T = [ C D ] · [ S T P a T ] = C · S T + D · P a T
Equation (8) shows that there are factors in the α·βT form during the calculation of the check bits Pa and Pb, such as A·ST, C·ST, and D·PT. However, during the calculation of Pa, there is an additional matrix multiplication between B−1 and A·ST. Considering the characteristics of the cyclic shift coefficients in the B block, let Var = A·ST, A is the submatrix of H with known coefficients, and S is the bit sequence of input information.
{ var T = A · S T = [ var 1 , var 2 , var 3 , var 4 ] T P a = [ p a 1 , p a 2 , p a 3 , p a 4 ]
For BG1 and BG2, there are four different cases of circular shift coefficients corresponding to the B submatrices of HBG matrices, which are shown as Figure 4a,b.
In Equation (7), A·ST + B·PaT = 0, so [var1, var2, var3, var4] and [Pa1, Pa2, Pa3, Pa4] have the following equation relationships.
The computational process of Pa and Var corresponding to the left submatrix of Figure 4a is as follows:
{ p a 1 ( < < 1 ) + p a 2 + var 1 = 0 p a 1 + p a 2 + p a 3 + var 2 = 0 p a 3 + p a 4 + var 3 = 0 p a 1 ( < < 1 ) + p a 4 + var 4 = 0
{ var 1 + var 2 + var 3 + var 4 = p a 1 [ ( var 1 + var 2 + var 3 + var 4 ) < < 1 ] + var 1 = p a 2 [ ( var 1 + var 2 + var 3 + var 4 ) < < 1 ] + var 3 + var 4 = p a 3 [ ( var 1 + var 2 + var 3 + var 4 ) < < 1 ] + var 4 = p a 4
The computational process of Pa and Var corresponding to the left submatrix of Figure 4b is as follows:
{ p a 1 + p a 2 + var 1 = 0 p a 2 + p a 3 + var 2 = 0 ( p a 1 < < 1 ) + p a 3 + p a 4 + var 3 = 0 p a 1 + p a 4 + var 4 = 0
{ ( var 1 + var 2 + var 3 + var 4 ) > > 1 = p a 1 var 1 + ( var 1 + var 2 + var 3 + var 4 ) > > 1 = p a 2 var 1 + var 2 + ( var 1 + var 2 + var 3 + var 4 ) > > 1 = p a 3 var 4 + ( var 1 + var 2 + var 3 + var 4 ) > > 1 = p a 4
The computational process of Pa and Var corresponding to the right submatrix of Figure 4a is as follows:
{ p a 1 + p a 2 + var 1 = 0 ( p a 1 < < ( 105 / Z ) ) + p a 2 + p a 3 + var 2 = 0 p a 3 + p a 4 + var 3 = 0 p a 1 + p a 4 + var 4 = 0
{ ( var 1 + var 2 + var 3 + var 4 ) > > ( 105 / Z ) = p a 1 [ ( var 1 + var 2 + var 3 + var 4 ) > > ( 105 / Z ) ] + var 1 = p a 2 [ ( var 1 + var 2 + var 3 + var 4 ) > > ( 105 / Z ) ] + var 3 + var 4 = p a 3 [ ( var 1 + var 2 + var 3 + var 4 ) > > ( 105 / Z ) ] + var 4 = p a 4
The computational process of Pa and Var corresponding to the right submatrix of Figure 4b is as follows:
{ ( p a 1 < < 1 ) + p a 2 + var 1 = 0 p a 2 + p a 3 + var 2 = 0 p a 1 + p a 3 + p a 4 + var 3 = 0 ( p a 1 < < 1 ) + p a 4 + var 4 = 0
{ var 1 + var 2 + var 3 + var 4 = p a 1 [ ( var 1 + var 2 + var 3 + var 4 ) < < 1 ] + var 1 = p a 2 [ ( var 1 + var 2 + var 3 + var 4 ) < < 1 ] + var 1 + var 2 = p a 3 [ ( var 1 + var 2 + var 3 + var 4 ) < < 1 ] + var 4 = p a 4
Finally, the check information bits Pa and Pb are obtained, and the encoded codeword C is the output.
{ P a = [ p a 1 , p a 2 , p a 3 , p a 4 ] P b T = C · S T + D · P a T C = [ S , P a , P b ]
For the check matrix H with a special structure in 5G LDPC codes, our scheme elides the matrix inversion operations in encoding. The scheme directly utilizes the linear mathematical relationship between Var and Pa to obtain the check sequence Pa by computing the intermediate variable (Var represents A·ST), which simplifies the encoding process. Through the above equations, the scheme uses var1, var2, var3, and var4 to solve Pa1, Pa2, Pa3, and Pa4. Because Pa has been solved, C and D are both submatrices with known coefficients in HBG, and S is the known information sequence. We can then obtain Pb by PbT = C·ST + D·PaT. Finally, we can obtain the encoded codeword C = [S, Pa, Pb] by combining S, Pa, and Pb. With this LDPC encoding scheme, the presented encoder mainly includes α·βT operation units in its hardware implementation, it greatly reduces the hardware complexity of the encoder architecture, laying a foundation for the realization of the high area-efficiency encoder in this paper.

4. Area-Efficient Parallel Pipelined QC-LDPC Encoder with Compatible Architecture

Based on the encoding scheme of this paper, α·βT units are the main operation units of the proposed encoder. A·ST, C·ST, and PaT are all operation units in the form of α·βT. According to the quasi-cyclic characteristics of 5G QC-LDPC, one operation unit can process the α·βT operation (Z-bit) in parallel, that is, to complete the computation of a Z-bit sequence in VarT or PbT. For the HBG1, PaT and PbT have up to 46 column sequences, and each sequence is Z-bit, so HBG1 needs 46 α·βT operation units. For the HBG2, PaT and PbT have up to 42 column sequences, and HBG2 then needs 42 α·βT operation units. In order to make the proposed encoder fully support 5G LDPC codes, this encoder sets 46 operation units to compatible with HBG1 and HBG2, and the operation units are distributed in Var generation module and Pb generation module. Furthermore, the cyclic shift operation and the XOR operation are combined to replace the α·βT computation, avoiding complicated multiplication-accumulation operations required by a direct α·βT operating process. This design not only significantly reduces the complexity of encoder and the hardware costs, but also greatly improves the computing efficiency.
Based on the above analysis, this work has designed a high area-efficient parallel QC-LDPC encoder with compatible architecture, which is shown in Figure 5. The encoder mainly consists of a serial-to-parallel information input buffer, a Var generation module, a configurable Pa generation module, a parallel Pb generation module, a cyclic shift coefficient memory module, and an encoding controller.
The upper and middle encoding modules (Var module and Pa module) of the encoder correspond to the high code rate of LDPC encoding used to generate the Pa check bits. The Pb encoding module corresponds to the extended matrix region in H to generate extended check bits. By selecting the number of enabled Pb operation units, the length of Pb check bits can be adjusted to determine the code rate of the encoded codeword. Thus, the encoder architecture can adapt to different code rates of LDPC encoding.

4.1. Information Input Buffer

The information sequence S is input into the buffer, and the buffer sends the Z-bit information sequence to the Var operation units in the form of parallel output. The input buffer register is implemented by a register set, which contains two Z-bit registers. Due to the controller signal, when one register supplies the Z-bit Si sequence to Var generation units in parallel, another register can preload the next Si+1 sequence. This structure enables the encoder to read in the next information frame to be encoded during the encoding of the present information frame, which saves the information reading time and improves the throughput of the encoder.

4.2. Cyclic Shift Coefficient Memory Module

Based on the structure characteristics of H matrices for 5G LDPC codes, the proposed encoder only needs to store the cyclic shift coefficient values corresponding to the A, C, and D submatrices in the Flash ROM. The memory module for other submatrices can be omitted. The A, C and D submatrices correspond to A_Block, C_Block, and D_Block, respectively. Moreover, the proposed encoder does not need to store the specific content of the H matrix.
The structure of the HBG1 is shown in Figure 2. The sizes of the A, C, and D submatrices are 4 × 22, 42 × 22, and 42 × 4, respectively. The cyclic shift coefficients of each row in A or [C, D] are stored in a ROM block, respectively. A total of 46 ROM blocks are required for the encoder. Among them, the coefficients corresponding to the A submatrix are stored in 4 ROMs, and each ROM stores 22 coefficients; the coefficients corresponding to Set [C, D] are stored in 42 ROM blocks, and each block stores 26 coefficient values. The front 22 values denote the coefficients corresponding to the C submatrix while the last 4 values denote the coefficients corresponding to the D submatrix, as shown in Figure 6.
The structure of the HBG2 is shown as Figure 3. The sizes of the A, C, and D submatrix are 4 × 10, 38 × 10, and 38 × 4, respectively. The coefficients of each row in A or Set [C, D] are respectively stored in a ROM block. A total of 42 ROM blocks are required as the memory module. Among them, the coefficients corresponding to the A submatrix require 4 ROM blocks for storage, and each block stores 10 coefficients. The coefficients corresponding to Set [C, D] requires 38 ROM blocks for storage, and each block stores 14 coefficient values; the first 10 values denote the coefficients corresponding to the C submatrix while the last 4 values denote the coefficients corresponding to the D submatrix, as shown in Figure 6.
In summary, each coefficient matrix of the HBG1 requires 46 ROM blocks for the coefficient memory; each coefficient matrix of the HBG2 requires 42 ROM blocks for the coefficient memory. The coefficient memory of the encoder is composed of 46 ROM blocks so that it can be compatible with the two matrices of HBG1 and HBG1. The bit width of the coefficients in ROM blocks is determined by the coefficient values of the known encoding algorithm.

4.3. Encoding Operation Unit

As the core operation unit of the encoder, the encoding operation unit is mainly used to realize operations in the α·βT form. The Var operation module containing α·βT encoding units is mainly used to execute the A·ST operation, while the Pb generation module containing α·βT units is mainly used to execute the operation process of [C D]·[S Pa]T. The circuit structure of the encoding operation unit is shown in Figure 7. The operation unit consists of a Z-bit barrel shift register, a row of XOR gates and a Z-bit state register.

4.4. Var Generation Module

The module integrates four encoding units, which are used to realize the operation of A·ST in Equation (9). The four encoding units consist of four α·βT operation units (Z-bit granularity), which correspond to var1, var2, var3 and var4 in turn. When the Var generation module receives the Z-bit data sequence from the information input buffer, the input buffer first transmits the Z-bit data sequence to the Z-bit barrel shift registers in var1, var2, var3, and var4 synchronously, by way of corresponding data bits transmission. At the same time, the barrel shift registers read the coefficients in the corresponding position of the A_ROM, and each barrel shift register then shifts the corresponding bits of the data sequence. This means that the Aij·SjT operation of a column of Z-bit data is completed, that is, the four operations of A1j·SjT, A2j·SjT, A3j·SjT, and A4j·SjT are computed in parallel (i denotes the row of the A submatrix; j denotes the column of the A submatrix). Each result data of the Aij·SjT operation takes an XOR operation with the current value of the Z-bit state register (equivalent to a binary addition operation), and one var replaces the value in its state register with the new XOR result, which represents the execution of an Aij·SjT + Ai(j+1)·S(j+1)T operation. The four var1, var2, var3, var4 operation units execute Aij·SjT + A(i+1)(j+1)·S(j+1)T operations in parallel, namely, four expressions (A11·S1T + A12·S2T, A21·S1T + A22·S2T, A31·S1T + A32·S2T and A41·S1T + A42·S2T) are achieved in parallel, and the initial value in each state register has been set to 0. The Var generation module continues to repeat the above operation process, until the completion of four-way ∑Aij·SjT computation, which would realize the computation processes of the four equations ∑A1j·SjT, ∑A2j·SjT, ∑A3j·SjT, and ∑A4j·SjT.

4.5. Configurable Pa Generation Module

As shown in Figure 5, the Var generation module can generate four output results of var1, var2, var3, and var4 after computing. By inputting the results to the corresponding interfaces of the configurable Pa computation network, it can generate four check information blocks, that is, Pa1, Pa2, Pa3, and Pa4.
The new 5G LDPC standards correspond to two sets of base graphs, BG1 and BG2. As shown in Figure 4, each base matrix (HBG) has two kinds of B submatrices. In other words, the HBG of 5G standards can be further divided into four base matrices. The Pa computation network is innovatively designed as a configurable circuit structure in line with the specific parameters of the four kinds of B submatrices, so that the proposed encoder can be fully compatible with the four base matrices. The four submatrices correspond to four different Pa computation processes. To implement the compatible encoder, the configurable Pa computation network is designed after a detailed analysis of the computational processes and path characteristics. The computation network consists of XOR units, configurable circular shift registers, data multiplexers, and a configurable circuit network. The circuit structure is shown in Figure 2 above. It can be compatible with the computational requirements of the four B submatrices, which means that the computation network can flexibly adapt to BG1 and BG2. Based on the configurable Pa computation network and the intermediate result Var, this encoder can flexibly implement the following four different computation processes to obtain the Pa sequence.
The four computation processes have been listed in the proposed algorithm, which will not be repeated here. The circuit paths of Pa computation network are as follows:
The computing paths of Pa corresponding to the left submatrix of Figure 4a are as follows:
{ p a 1 = i = 1 4 var i p a 2 = var 1 + ( i = 1 4 var i ) < < 1 p a 2 = var 3 + var 4 + ( i = 1 4 var i ) < < 1 p a 2 = var 4 + ( i = 1 4 var i ) < < 1
The computing paths of Pa corresponding to the left submatrix of Figure 4b are as follows:
{ p a 1 = ( i = 1 4 var i ) > > 1 p a 2 = var 1 + ( i = 1 4 var i ) > > 1 p a 2 = var 1 + var 2 + ( i = 1 4 var i ) > > 1 p a 2 = var 4 + ( i = 1 4 var i ) > > 1
The computing paths of Pa corresponding to the right submatrix of Figure 4a are as follows:
{ p a 1 = ( i = 1 4 var i ) > > ( 105 / Z ) p a 2 = var 1 + ( i = 1 4 var i ) > > ( 105 / Z ) p a 2 = var 3 + var 4 + ( i = 1 4 var i ) > > ( 105 / Z ) p a 2 = var 4 + ( i = 1 4 var i ) > > ( 105 / Z )
The computing paths of Pa corresponding to the right submatrix of Figure 4b are as follows:
{ p a 1 = i = 1 4 var i p a 2 = var 1 + ( i = 1 4 var i ) < < 1 p a 2 = var 1 + var 2 + ( i = 1 4 var i ) < < 1 p a 2 = var 4 + ( i = 1 4 var i ) < < 1
Pa sequence register (PaSR): PaSR accepts the Pa check blocks from the configurable Pa computation network in parallel, and then registers Pa check blocks into a register set composed of 4 dual-port RAMs. According to the output signal, PaSR outputs the four check information blocks (Pa1, Pa2, Pa3, and Pa4) to the output port of the encoder, which will be stored in the corresponding positions of the encoding memory, belonged to the peripheral main system.

4.6. Parallel Pb Generation Module

The Pb generation module is mainly composed of (Mb − 4) encoding operation units, Mb represents the number of rows corresponding to the HBG. The module is used to implement the operations of PbT = C·ST + D·PaT in Equation (18), including (Mb − 4) α·βT units with the Z-bit granularity. The structure of the encoding units in the Pb generation module is similar to that of the Var generation module. During the generation of Pb check sequences, the computing process of the Pb generation module can be mainly divided into two steps:
(1)
The first step is used to complete the computation of C·ST, which is executed synchronously with the computation process of the Var generation module. The Pb generation module receives the Z-bit data from the information input buffer and transmits to (Mb − 4) operation units in parallel. At the same time, due to the control signal, the cyclic shift coefficients are transmitted to the Pb generation module, and these are obtained from the corresponding positions in the C_ROM. The (Mb − 4) operation units compute the data sequence in parallel with corresponding coefficients to complete the Cij·SjT process (i denotes the row of the C submatrix, and j denotes the column of the C submatrix). Namely, the units have completed the parallel computation of C1j·SjT, C2j·SjT, C3j·SjT, …, C(Mb−4)j·SjT. The obtained values of all Cij·SjT will be taken XOR operation with the current values of the state registers in the corresponding operation units, thus realizing the accumulation operation of each Cij·SjT value. The result is as follows:
C i j · S j T + J = 1 j 1 C i J · S J T
The (Mb − 4) operation units will execute the operation of Equation (23) in parallel:
C 1 j · S j T + J = 1 j 1 C 1 J · S J T C 2 j · S j T + J = 1 j 1 C 2 J · S J T C 3 j · S j T + J = 1 j 1 C 3 J · S J T C ( M b     4 ) j · S j T + J = 1 j 1 C ( M b     4 ) J · S J T
The computation process of Equation (24) is executed in parallel. Taking BG1 as an example, the size of the submatrix C is 42 × 22:
{ C 11 · S 1 T + C 12 · S 2 T + C 13 · S 3 T + + C 1 , 22 · S 22 T = J = 1 22 C 1 J · S J T C 21 · S 1 T + C 22 · S 2 T + C 23 · S 3 T + + C 2 , 22 · S 22 T = J = 1 22 C 2 J · S J T C 31 · S 1 T + C 32 · S 2 T + C 33 · S 3 T + + C 3 , 22 · S 22 T = J = 1 22 C 3 J · S J T C ( M b 4 ) 1 · S 1 T + C ( M b 4 ) 2 · S 2 T + C ( M b 4 ) 3 · S 3 T + + C ( M b 4 ) , 22 · S 22 T = J = 1 22 C ( M b 4 ) J · S J T
Lastly, the computational results of the (Mb − 4) operation units are obtained. In this way, the C·ST operations of a data sequence S = {S1, S2, S3, ···, S22} is completed in parallel. The length of a data unit Sj is Z bits.
(2)
The second step is used to execute the D·PaT operation and complete the computation of PbT = C·ST + D·PaT. The Pb operation units from 1 to (Mb − 4) receive the check bits Pa(j) generated by the Pa generation module in parallel. At the same time, the cyclic shift coefficients at the corresponding positions of the D_ROM are sent to the Pb operation units. The cyclic shift coefficients of one column in the D_ROM are read each time (the coefficients’ number of one column corresponds to the number of Pb operation units). The coefficients of the column are then sent to each Pb operation unit synchronously. The operation units will accurately execute cyclic-shift operations to the Pa check bits (Z-bit) immediately. Such operations are used to replace the multiplication of D·PaT to obtain the D1j·Pa(j), D2j·Pa(j), ···, D(R-4)j·Pa(j). Then, the (Mb − 4) sequence values will be performed XOR operations in parallel with the current values of the state register in each Pb operation unit. The completion of PbT = C·ST + D·PaT only requires 4 clock cycles after the operations in the first step are finished.
{ P b 1 T = D 11 · P a 1 T + D 12 · P a 2 T + D 13 · P a 3 T + D 14 · P a 4 T + J = 1 22 C 1 J · S J T P b 2 T = D 21 · P a 1 T + D 22 · P a 2 T + D 23 · P a 3 T + D 24 · P a 4 T + J = 1 22 C 2 J · S J T P b 3 T = D 31 · P a 1 T + D 32 · P a 2 T + D 33 · P a 3 T + D 34 · P a 4 T + J = 1 22 C 3 J · S J T P ( M b 4 ) T = D ( M b 4 ) 1 · P a 1 T + D ( M b 4 ) 2 · P a 2 T + D ( M b 4 ) 3 · P a 3 T + D ( M b 4 ) 4 · P a 4 T + J = 1 22 C ( M b 4 ) J · S J T
Finally, all check bits of Pb, namely, the check sequences of Pb = {Pb1, Pb2, Pb3, ···, Pb(Mb4)} are obtained in parallel. The Pb generation module transmits Pb to the output port which is then output to the encoding memory of the peripheral main system.
In the encoding memory, an encoded codeword consists of the information bits (S), the check bits (Pa) and (Pb), and the codewords will be transmitted in the form of {S, Pa, Pb} in a batch manner.

4.7. Controller Module

The controller module is responsible for the control function of the encoder. It generates the corresponding signals to control function modules of the encoder to execute the relevant encoding works correctly. Its main signals include encoding control signal, memory control signal, input/output control signal, and circuit configuration signal.

5. Experimental Results and Comparative Analysis

5.1. Comparison of Encoding Methods

The proposed encoder architecture can be fully compatible with BG1 and BG2. It can adapt to LDPC codes with different lifting sizes Z in these two base graphs. This work implements BG1 and BG2 with two sets of lifting sizes Z, which fully verifies the performance indicators of the new encoder architecture, as well as the area efficiency of each encoder. The ASIC post synthesis implementation results on 65 nm CMOS technology are shown in the Section 5.2.
Firstly, the proposed encoders have been compared with other LDPC encoding schemes. Table 4 quantitatively compares the proposed encoder to related prior art. We normalised the processes of the ASIC implementations to 65 nm. The comparative results of the ASIC implementations are based on normalized data.
Reference [4] introduces a new structure for the multiply operation of a bit vector with a dense QC matrix, which is the basic operation for LDPC encoding. According to a improved scheduling, the encoding architecture utilizes the parallelism of the LDPC codes by processing multiple bits concurrently. Based on the design, it proposes encoder architecture for CCSDS codes with the applicable encoding methods. It is pending further research on the compatibility of different CCSDS codes, which is expected to improve the flexibility and reduce the cost. Compared with the architecture, this wok has about 2–30 times throughput. Its implenentation occupies 8945 LUTs and 12,420 Flip-Flops of the FPGA (Virtex-7). In terms of its occupied resources, its resource efficiency is also lower than this work.
The paper introduces a encoder scheme of LDPC generator matrix in frequency modulation-China digital radio (CDR) [6]. It utilizes the feature of the parity matrix to parallelize encoding operations for rows and columns. An approach is introduced to control memories; it can be applied to the code with some rates to improve the utilization of circuit resources. Its implenentation occupies 32,479 LUTs, 32,313 Registers, and 36 Block RAM of the FPGA (Spartan-6). Compared to the encoder, this work has obvious promotion in the throughput and the resource efficiency.
Although Reference [26] proposes a QC-LDPC encoder structure for 5G NR, its encoding scheme only considers the case of one submatrix B in a single base graph. The scheme has not researched the obvious requirements of encoding compatibility for practical applications of 5G LDPC. However, for 5G or B5G communication, 3GPP and major corporations have focused on the compatibility for the requirements of diverse applications and integrate the compatibility into the formulation of the 5G LDPC standard to serve the needs of various scenarios. Two kinds of base graphs are involved in 5G LDPC codes; HBG1 and HBG2 have significant difference, which can be further divided into four base matrices, considering four different submatrices B. This work includes a whole set of the high-compatible algorithm, and the proposed encoder also has wide compatibility, which is fully compatible with different base graphs of 5G LDPC codes. Thus, this work has more flexible adaptability for various 5G application scenarios. Besides, this work has obvious advantages of higher performance and higher area-efficiency. Compared with the scheme, this wok has about 1.6–2.8 times throughput and 1.4–2.5 times area efficiency, when implementing the proposed encoders in different sizes. These advantages represent lower latency and lower application cost.
Reference [27] proposes two kind encoding hardware designs for Irregular Repeat Accumulate (IRA) LDPC codes, which can be used in communication and memory systems. One proposed architecture is a reconfigurable architecture; it is suitable for applications requiring the transition among finite codes frequently. The second proposed architecture utilizes the sparse feature of the parity-check matrices, and reduces the circuit cost by storing its matrix in memory in the sparse form. Compared with the architectures, this wok has more than 9 times throughput and obvious resource efficiency.
Reference [31] takes the Gauss Elimination method to design the check matrix; the encoded codeword is obtained by matrix multiplication based on the generator matrix and the codeword. It proposed a regular LDPC encoder with pipelining structures to obtain a compact encoding process. However, matrix multiplication causes relative complexity in terms of a small block size. The memory overhead will be rapidly raised for larger blocks and for multiple data frames. Compared with the method, this work has about 10–100 times throughput and 4–8 times area-efficiency (Throughput/Area), when implementing the LDPC encoders.
A LDPC encoder is presented for CMMB based on RU algorithm [32]. The RU method takes an modified greedy algorithm for the sparse marix with approximate triangulation. This optimized algorithm can reduce the complexity of encoding. But the implementation of the RU method requires a set of calculations, where data dependence exists in computation steps, limiting the parallelism. Besides, the method has a long critical path, that would cause the encoder implementation unsuitable for high performance scenarios. The LDPC encoder is implemented with Stratix II FPGA, and it consumes 60% the memory resource and 4% the logic resource of the chip. Compared to the design, this work has significant advantages in the throughput and the resource efficiency than its implementation results.
Reference [33] proposes a nonbinary QC-LDPC encoding architecture; it introduces two methods taking advantage of finite Fourier transform to reduce the hardware complexity. In the paper, a GF(22) QC-LDPC encoder is implemented, and the relevant parameters of the result are normalized to 65 nm process. Compared with the scheme, this wok has more than 12.6 times throughput and 1.8–3.7 times area efficiency.

5.2. Implementation Comparison of BG1 and BG2

As the core of one LDPC code, BG determines the macro characteristics and encoding performance of the LDPC code. There are two sets of base graphs in the 5G NR standard, namely, BG1 and BG2. In order to satisfy the needs of different communication scenarios, 5G LDPC codes should be able to flexibly support different encoding parameters. Considering the diversity of future communication scenarios, new encoders should be compatible with BG1 and BG2.
For the two base graphs, we choose multiple lifting sizes Z, which are uniformly distributed from small to large. For BG1, the proposed encoders choose lifting sizes Z as 28, 60, 128, 240, 384, which are respectively recorded as BG1(28), BG1(60), BG1(128), BG1(240), BG1(384) in Table 5. For BG2, the proposed encoders choose submatrix sizes Z as 64, 120, 224, 352, which are respectively recorded as BG2(64), BG2(120), BG2(224), BG2(352) in Table 6.
In Table 5 and Table 6, Length denotes the word length bits required to match lifting sizes Z. ECC denotes the clock cycles required by encoders to complete the encoding process of a codeword. By default, all data bits of input information need to be encoded. As for the proposed encoder, ECC is equal to the total number of clock cycles required to generate the Pa and Pb check bits corresponding to a codeword.
Based on the parallel pipelined encoding architecture, the proposed encoder needs a total of kb + 4 clock cycles to generate check sequences (Pa and Pb). In addition, it needs another 2 clock cycles to input the information sequence and output the encoded codeword. Therefore, the proposed encoder needs a total of kb + 6 clock cycles to complete the encoding of an information sequence. The throughput rate of check bits is an important index, which represents the encoder performance. The throughput rates of different Z sizes are shown in Table 5 and Table 6. Since the actual output sequences of the encoder are check sequences, the throughputs of check sequences are recorded as T-P in the tables, and the throughputs of corresponding information sequences are recorded as T-S. Their computation equations for the proposed encoder are as follows:
T P = Z × f × M b E C C
where Mb denotes the number of rows corresponding to the base matrix, Z denotes the expansion size, and f denotes the work frequency of the encoder. ECC denotes clock cycles required to complete the encoding of a codeword, and ECC = kb + 6. For BG1, T-P ranges from 34.3 Gbps to 362.7 Gbps for different Z sizes. For BG2, T-P ranges from 121.8 Gbps to 541.5 Gbps for different Z sizes. The comparison shows that when the Z sizes are similar, T-P(BG2) is significantly higher than T-P(BG1). This is mainly because in the case of BG1, the encoders require kb + 6 = 28 clock cycles to complete the encoding of a codeword. In the case of BG2, the encoder only needs kb + 6 = 16 clock cycles to complete the encoding codeword, that is, ECC(BG1) is greater than ECC (BG2). Therefore, for the same Z size, T-P(BG2) is higher than T-P(BG1).
The computational equation of the corresponding data information is as follows:
T S = Z × f × k b E C C
where kb is equal to the number of columns in the submatrix A, and the submatrix A corresponds to S sequences. Z and f have the same meanings as defined in T-P. For BG1, T-S ranges from 16.4 Gbps to 173.5 Gbps for different Z sizes. For BG2, T-S ranges from 29.0 Gbps to 128.9 Gbps for different Z sizes. The comparison shows that when the Z sizes are similar, T-S(BG1) is larger than T-S(BG2). This is because kb(BG1) is significantly larger than kb(BG2), resulting in a gap in information encoding performance between them. It is also in line with the reason why 5G LDPC standards formulate two sets of base graphs. BG1 is mainly used for scenarios that require high data throughput; BG2 is mainly used for scenarios with lower requirements for data throughput.
The two tables show the synthesized areas of the encoder architecture in different Z sizes. It can be found that the encoders’ areas gradually become larger as the increase of Z sizes. This is because the codeword length of the encoder increases with the Z sizes synchronously, making the hardware cost of one encoder needs larger memory modules and more logic units. The comparison shows that the complexity of the proposed encoder is positively related to the lifting size Z. In order to fairly compare the area efficiency of the proposed encoder in different Z sizes, Table 5 and Table 6 use AE to represent the area efficiency, which is expressed as follows:
AE = Throughput Area
In addition, AE-P denotes the area efficiency of the check bits generated by the encoder. AE-S denotes the area efficiency of the information data bits encoded by the encoder. For BG1, AE-P ranges from 879 Gbps/mm2 to 710 Gbps/mm2. For BG2, AE-P ranges from 1450 Gbps/mm2 to 1245 Gbps/mm2. It can be known that when the Z sizes are similar, AE-P(BG2) is higher than AE-P(BG1), which is mainly because T-P(BG2) is higher than T-P(BG1). For BG1, AE-S ranges from 421 Gbps/mm2 to 339 Gbps/mm2. For BG2, AE-S ranges from 345 Gbps/mm2 to 296 Gbps/mm2. It is clear that AE-S(BG1) is higher than AE-S(BG2), which is due to that T-S(BG1) is higher than T-S(BG2).
By analyzing the experimental data, it can be concluded that the new architecture encoder is compatible with two sets of base graphs, BG1 and BG2. The implemented encoders can flexibly adapt to submatrix sizes with various granularities. Their performance and area-efficiency are significantly high. The encoder architecture can not only meet the encoding requirements of 5G NR, but also achieve higher encoding performance.

6. Conclusions

This paper presents a parallel LDPC encoding algorithm with high compatibility, which is compatible with 5G LDPC standards, and this work has implemented the high area-efficient parallel encoder with compatible architecture for 5G LDPC codes based on the proposed algorithm. The proposed encoder has the advantages of parallel encoding and pipeline operation, and it takes a configurable encoding structure. Therefore, the encoder architecture has flexible adaptability with 5G LDPC codes. Based on the encoder architecture, we implemented nine encoders for different Z sizes distributed in two base graphs. The experimental results show that the proposed encoder has high performance and significant area-efficiency. It is better than the related prior art. These indicate that the encoding scheme can satisfy the requirements of current 5G LDPC codes, and it can be further applied to future communication scenarios with higher encoding requirements.

Author Contributions

Conceptualization, Y.Z. (Yufei Zhu); Funding acquisition, Z.X.; Investigation, Y.Z. (Yufei Zhu); Methodology, Y.Z. (Yufei Zhu) and Z.X.; Project administration, Z.X.; Resources, Y.Z. (Yang Zhang) and Y.H.; Software, Y.Z. (Yufei Zhu) and Z.L.; Writing—original draft, Y.Z. (Yufei Zhu); Writing—review & editing, Y.Z. (Yang Zhang) and Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China, grant number 61874140, and it is funded by Hunan Science Project Foundation, grant number 2018xk2102.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. MacKay, A.D. Good error-correcting codes based on very sparse matrices. IEEE Trans. Inf. Theory 1999, 45, 399–431. [Google Scholar] [CrossRef] [Green Version]
  2. Tsatsaragkos, I.; Paliouras, V. A reconfigurable LDPC decoder optimized for 802.11n/ac applications. IEEE Trans. Very Large Scale Integr. Syst. 2018, 26, 182–195. [Google Scholar] [CrossRef]
  3. Lazarenko, A.V. FPGA design and implementation of DVB-S2/S2X LDPC encoder. In Proceedings of the 2019 IEEE International Conference on Electrical Engineering and Photonics, St. Petersburg, Russia, 17–18 October 2019; pp. 98–102. [Google Scholar]
  4. Theodoropoulos, D.; Kranitis, N.; Tsigkanos, A. Efficient architectures for multigigabit CCSDS LDPC encoders. IEEE Trans. Very Large Scale Integr. Syst. 2020, 28, 1118–1127. [Google Scholar] [CrossRef]
  5. Patil, P.; Patil, M.; Itraj, S. IEEE 802.11n: Joint modulation-coding and guard interval adaptation scheme for throughput enhancement. Int. J. Commun. Syst. 2020, 33, 1–14. [Google Scholar] [CrossRef]
  6. Chen, D.; Chen, P.; Fang, Y. Low-complexity high-performance lowdensity parity-check encoder design for China digital radio standard. IEEE Access 2017, 5, 20880–20886. [Google Scholar]
  7. 5G PPP Architecture Working Group. View on 5G Architecture; White Paper, v 1.0; European Union: Luxembourg, 2016. [Google Scholar]
  8. Fuentes, M.; Carcel, J.L.; Dietrich, C. 5G new radio evaluation against IMT-2020 key performance indicators. IEEE Access 2020, 8, 110880–110896. [Google Scholar] [CrossRef]
  9. Gour, R.; Ishigaki, G.; Kong, J. Availability-guaranteed slice composition for service function chains in 5G transport networks. J. Opt. Commun. Netw. 2021, 13, 14–24. [Google Scholar] [CrossRef]
  10. Komal, A.; Jaswinder, S.; Yogeshwar, S. A survey on channel coding techniques for 5G wireless networks. Telecommun. Syst. 2020, 73, 637–663. [Google Scholar]
  11. Multiplexing and Channel Coding. Document TS 38.212 V15.0.0, 3GPP. 2017. Available online: https://www.3gpp.org/ftp/Specs/archive/38_series/38.212/ (accessed on 3 January 2018).
  12. Hui, D.; Sandbeg, S.; Blank, Y. Channel coding in 5G new radio: A tutorial overview and performance comparison with 4G LTE. IEEE Veh. Technol. Mag. 2018, 13, 60–69. [Google Scholar] [CrossRef]
  13. Richardson, T.; Kudekar, S. Design of low-density parity check codes for 5G new radio. IEEE Commun. Mag. 2018, 56, 28–34. [Google Scholar] [CrossRef]
  14. Huo, Y.; Dong, X.; Xu, W. 5G cellular user equipment: From theory to practical hardware design. IEEE Access 2017, 5, 13992–14009. [Google Scholar] [CrossRef]
  15. Zhou, W.; Lentmaier, M. Generalized two-magnitude check node updating with self correction for 5G LDPC codes decoding. In Proceedings of the 12th ITG Conference, Rostock, Germany, 11–14 February 2019; pp. 1–6. [Google Scholar]
  16. Li, J.; Lin, S.; Ab, K. LDPC Code Designs, Constructions, and Unification; Cambridge University Press: Cambridge, UK, 2017. [Google Scholar]
  17. Multiplexing and Channel Coding (Release 15). Document TS 38.212, V15.4.0, 3GPP. December 2018. Available online: https://www.3gpp.org/ftp/Specs/archive/38_series/38.212/ (accessed on 11 January 2019).
  18. Li, H.; Bai, B.; Mu, X. Algebra-assisted construction of quasi-cyclic LDPC codes for 5G new radio. IEEE Access 2018, 6, 50229–50244. [Google Scholar] [CrossRef]
  19. Zhang, Y.; Peng, K.; Wang, X. Performance analysis and code optimization of IDMA with 5G new radio LDPC code. IEEE Commun. Lett. 2018, 8, 1552–1555. [Google Scholar] [CrossRef]
  20. NR; Physical Layer Procedures for Data (Release 15). Document TS 38.214, 3GPP. 2018. Available online: https://www.3gpp.org/ftp/Specs/archive/38_series/38.214/ (accessed on 9 April 2018).
  21. Bae, J.H.; Abotabl, A.; Lin, H.P. An overview of channel coding for 5G NR cellular communications. APSIPA Trans. Signal Inf. Process. 2019, 8, 1–14. [Google Scholar] [CrossRef] [Green Version]
  22. Mahdi, A.; Paliouras, V. A low complexity-high throughput QC-LDPC encoder. IEEE Trans. Signal Process. 2014, 62, 2696–2708. [Google Scholar] [CrossRef]
  23. Heyun, H.E. Principle and Application of LDPC; Post & Telecom Press: Beijing, China, 2009; pp. 147–151. [Google Scholar]
  24. Mathur, J.P.S.; Pandey, A. FER performance analysis and optimization of diagonal structure based QC-LDPC codes with girth 12 using LU decomposition. J. Electr. Eng. Technol. 2020, 15, 1405–1412. [Google Scholar] [CrossRef]
  25. Richardson, T.J.; Urbanke, R.L. Efficient encoding of low-density parity-check codes. IEEE Trans. Inf. Theory 2001, 47, 638–656. [Google Scholar] [CrossRef] [Green Version]
  26. Tram, T.B.; Tuy, N.T.; Hanho, L. Efficient QC-LDPC encoder for 5G new radio. Electronics 2019, 8, 1–15. [Google Scholar]
  27. Talati, N.; Wang, Z.; Kvatinsky, S. Rate-compatible and high-throughput architecture designs for encoding LDPC Codes. In Proceedings of the IEEE International Symposyum Circuits Systems (ISCAS), Baltimore, MD, USA, 28–31 May 2017; pp. 1–4. [Google Scholar]
  28. Mahdi, N.K.; Paliouras, V. A multirate fully parallel LDPC encoder for the IEEE 802.11n/ac/ax QC-LDPC codes based on reduced complexity XOR trees. IEEE Trans. Very Large Scale Integr. Syst. 2021, 29, 51–64. [Google Scholar] [CrossRef]
  29. Tzimpragos, G.; Kachris, C.; Soudris, D.; Tomkos, I. A low-complexity implementation of QC-LDPC encoder in reconfigurable logic. In Proceedings of the 2013 23rd International Conference on Field Programmable Logic and Applications, IEEE, Porto, Portugal, 2–4 September 2013; pp. 1–4. [Google Scholar]
  30. Wu, H.; Wang, H. A High Throughput Implementation of QC-LDPC codes for 5G NR. IEEE Access 2019, 7, 185373–185384. [Google Scholar] [CrossRef]
  31. Nandalal, V.; Kumar, V.A. Design and analysis of (5, 10) regular LDPC encoder using MRP technique. Wirel. Pers. Commun. 2021, 1, 1–17. [Google Scholar]
  32. Zhibin, Z.; Xiangran, S. A new LDPC encoder for CMMB based on RU algorithm. International conference on applied physics and industrial engineering (ICAPIE). Phys. Proc. China 2012, 24, 864–870. [Google Scholar]
  33. Zhang, X.; Tai, Y. Low-complexity transformed encoder architectures for quasi-cyclic nonbinary LDPC codes over subfields. IEEE Trans. Very Large Scale Integr. Syst. 2017, 25, 1342–1351. [Google Scholar] [CrossRef]
Figure 1. Structure of 5G LDPC check matrix.
Figure 1. Structure of 5G LDPC check matrix.
Symmetry 13 00700 g001
Figure 2. Base matrix structure of BG1 (316 Elements 1).
Figure 2. Base matrix structure of BG1 (316 Elements 1).
Symmetry 13 00700 g002
Figure 3. Base matrix structure of BG2 (197 Elements 1).
Figure 3. Base matrix structure of BG2 (197 Elements 1).
Symmetry 13 00700 g003
Figure 4. Cyclic shift coefficients of B submatrices for HBG1 (a) and HBG2 (b).
Figure 4. Cyclic shift coefficients of B submatrices for HBG1 (a) and HBG2 (b).
Symmetry 13 00700 g004
Figure 5. High Area-efficient parallel QC-LDPC encoder with compatible architecture.
Figure 5. High Area-efficient parallel QC-LDPC encoder with compatible architecture.
Symmetry 13 00700 g005
Figure 6. Cyclic Shift Coefficient Memory Module.
Figure 6. Cyclic Shift Coefficient Memory Module.
Symmetry 13 00700 g006
Figure 7. Encoding Operation Unit.
Figure 7. Encoding Operation Unit.
Symmetry 13 00700 g007
Table 1. Comparison of HBG matrices.
Table 1. Comparison of HBG matrices.
HBGSizeMbNbElement 1
BG146 × 684668316
BG242 × 524252197
Table 2. Lifting Size Z in 5G LDPC codes.
Table 2. Lifting Size Z in 5G LDPC codes.
ZJ
01234567
a2248163264128256
33612244896192384
5510204080160320
77142856112224
99183672144288
1111224488176352
13132652104208
15153060120240
Table 3. Sets of Lifting Size Z.
Table 3. Sets of Lifting Size Z.
Set 1Z = 2 × 2j, j = 0,1,2,3,4,5,6,7
Set 2Z = 3 × 2j, j = 0,1,2,3,4,5,6,7
Set 3Z = 5 × 2j, j = 0,1,2,3,4,5,6
Set 4Z = 7 × 2j, j = 0,1,2,3,4,5
Set 5Z = 9 × 2j, j = 0,1,2,3,4,5
Set 6Z = 11 × 2j, j = 0,1,2,3,4,5
Set 7Z = 13 × 2j, j = 0,1,2,3,4
Set 8Z = 15 × 2j, j = 0,1,2,3,4
Table 4. Comparison information.
Table 4. Comparison information.
Work[4][6][26][27][31][32][33]This Work
TypeCCSDSCDR5G NRIRAGaussRUGF(22) QC5G
Implemented CodesQC-LDPCLDPCQC-LDPC
(B1 of BG1)
LDPCLDPCQC-LDPCQC-LDPCQC-LDPC
(Fully Compatible with BG1 and BG2)
Process
(nm)
2845652890902865
Comparison PlatformFPGAFPGAASICASICASICFPGAASICASIC
Frequency
(MHz)
495200600 (Z352)154
(213) *
200400
(172) *
575 (BG1-Z384)
725 (BG2-Z64)
Area
(Resource) (mm2)
35 × 35
8945 LUTs
+ 12,420 FFs
23 × 23
32,479 LUTs + 36 BRAM
0.38958.5 K Gates+ (512 × 64) SRAM0.032
(0.017) *
529
60% memory 4% logic
8.66 k Gates (0.007) *0.511 (Z384)
0.084 (Z64)
Throughput (Gbps)15.840.4202.43.57
(1.54) *
2.31
(3.20) *
0.0696.3
(2.71) *
362.7 (BG1)
121.8 (BG2)
Throughput/
Area (Gbps/mm2)
LowLow520Low188 *Low387 *710 (BG1)
1450 (BG2)
* Areas are scaled as λ2, frequencies are scaled as 1/λ, where λ is the process-size ratio for 65 nm.
Table 5. Result comparison of ASIC implementation for lifting size Z = 28, 60, 128, 240, 384.
Table 5. Result comparison of ASIC implementation for lifting size Z = 28, 60, 128, 240, 384.
Proposed Encoder 1BG1(28)BG1(60)BG1(128)BG1(240)BG1(384)
Matrix Set48182
Lifting size Z2860128240384
Length56789
ECC (clock cycles)2828282828
Frequency (MHz)746709658610575
Area (mm2)0.0390.0860.1820.3280.511
Equivalent Gates48.5 K107.1 K226.9 K410.3 K639.5 K
T-P (Gbps)34.369.9138.4240.5362.7
AE-P (Gbps/mm2)879813760733710
T-S (Gbps)16.433.466.2115.0173.5
AE-S (Gbps/mm2)421388364351339
1 ASIC post synthesis implementation results on 65 nm CMOS technology.
Table 6. Result comparison of ASIC implementation for lifting size Z = 64, 120, 224, 352.
Table 6. Result comparison of ASIC implementation for lifting size Z = 64, 120, 224, 352.
Proposed Encoder 1BG2(64)BG2(120)BG2(224)BG2(352)
Matrix Set1846
Lifting size Z64120224352
Length6789
ECC (clock cycles)16161616
Frequency (MHz)725672631586
Area (mm2)0.0840.1560.2820.435
Equivalent Gates104.6 K195.1 K352.9 K545.2 K
T-P (Gbps)121.8211.7371.0541.5
AE-P (Gbps/mm2)1450135713161245
T-S (Gbps)29.050.488.3128.9
AE-S (Gbps/mm2)345323313296
1 ASIC post synthesis implementation results on 65 nm CMOS technology.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhu, Y.; Xing, Z.; Li, Z.; Zhang, Y.; Hu, Y. High Area-Efficient Parallel Encoder with Compatible Architecture for 5G LDPC Codes. Symmetry 2021, 13, 700. https://doi.org/10.3390/sym13040700

AMA Style

Zhu Y, Xing Z, Li Z, Zhang Y, Hu Y. High Area-Efficient Parallel Encoder with Compatible Architecture for 5G LDPC Codes. Symmetry. 2021; 13(4):700. https://doi.org/10.3390/sym13040700

Chicago/Turabian Style

Zhu, Yufei, Zuocheng Xing, Zerun Li, Yang Zhang, and Yifan Hu. 2021. "High Area-Efficient Parallel Encoder with Compatible Architecture for 5G LDPC Codes" Symmetry 13, no. 4: 700. https://doi.org/10.3390/sym13040700

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop