Next Article in Journal
Black-Box Watermarking and Blockchain for IP Protection of Voiceprint Recognition Model
Previous Article in Journal
A Multi-Stage Adaptive Copy-Paste Data Augmentation Algorithm Based on Model Training Preferences
Previous Article in Special Issue
Subsampling of 3D Pixel Blocks as a Video Compression Method for Analog Transmission
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research and Implementation of High-Efficiency and Low-Complexity LDPC Coding Algorithm

1
School of Electronic Information and Electrical Engineering, Chengdu University, Chengdu 610106, China
2
Sichuan Time Frequency Synchronization System and Application Engineering Technology Research Center, Chengdu 610106, China
3
Chengdu Jinjiang Electronic System Engineering Co., Ltd., Chengdu 610051, China
4
The Fifth Research Institute of Telecommunications Science and Technology Co., Ltd., Chengdu 610021, China
*
Authors to whom correspondence should be addressed.
Electronics 2023, 12(17), 3696; https://doi.org/10.3390/electronics12173696
Submission received: 26 July 2023 / Revised: 25 August 2023 / Accepted: 29 August 2023 / Published: 1 September 2023

Abstract

:
In this work, we proposed a high-efficiency and low-complexity encoding algorithm and its corresponding implementation structure during the design and implementation process of an LDPC encoder and decoder. This proposal was derived from extensive research on and analysis of standard encoding algorithms and recursive iterative encoding algorithms, specifically targeting the problem of high computational complexity in encoding algorithms. Subsequently, we combined binary phase-shift keying modulation mode and additive white Gaussian noise channel transmission with the min-sum decoding algorithm to realize the (1536, 1024) LDPC codec. This codec was uniformly quantized with a (6, 2) configuration, executed eight iterations, and achieved a 2/3 code rate in the IEEE802.16e standard. At the bit error rate (BER) of 10 5 , the codec’s BER obtained by the proposed coding algorithm was about 0.25 dB lower than the recursive-iterative coding algorithm and was about 1.25 dB lower than the standard coding algorithm, which confirms the correctness, effectiveness, and feasibility of the proposed algorithm.

1. Introduction

LDPC codes, which are linear block codes characterized by sparse matrices, offer performance that approaches the Shannon limit. They possess remarkable flexibility and low error leveling, and exhibit low decoding complexity. LDPC codes have a highly adatable structure that allows for complete parallel operations. Their hardware implementation complexity is also low, enabling high throughput. LDPC codes also hold the potential for high-speed decoding and demonstrate excellent error correction performance in sudden situations. Notably, LDPC codes do not require correlation during encoding and decoding processes. A single LDPC code can be widely utilized across various channels and has undergone rigorous theoretical analysis to ensure its verifiability. In comparison to Turbo codes, LDPC codes exhibit superior performance in terms of good distance, low complexity, and high parallel decoding methods [1].
Initially proposed by Gallager in 1962, LDPC codes, as a superior-performance linear block codes, did not garner significant research attention due to restricted research and implementation conditions at the time. It was only in 1993, when Berrou et al. [2] developed the Turbo codes and achieved significant research results, that the LDPC codes were reintroduced to researchers. In 1996, Mackey et al. [2] conducted in-depth investigations and analysis on LDPC codes, revealing their superior coding performance, which even surpassed that of Turbo codes, particularly in scenarios involving long code lengths [3].
The check matrix of LDPC codes is characterized by sparsity, allowing them to overcome the enormous decoding operations in the case of long block codes. Consequently, efficient decoding algorithms can be applied. Additionally, LDPC codes integrate flexible structures, low decoding complexity, complete concurrent operation, and straightforward hardware implementation [4]. This paper, motivated by the advantages of LDPC codes and aiming to address the high computational complexity of standard coding algorithms, proposes a high-efficiency and low-complexity coding algorithm. Unlike traditional rapid coding algorithms, our proposed approach is achieved by analyzing and studying the check matrices. Experimental evaluation demonstrates that the developed algorithm effectively reduces implementation complexity while ensuring desirable coding performance.

2. LDPC Code in the IEEE802.16e Standard

The LDPC code in the IEEE802.16e standard [5] is obtained by the quasi-cyclic constructive method, affording high coding performance, low computational complexity, and easy hardware implementation [6,7,8,9]. The LDPC codes with the same code rate but different lengths have consistently structured check matrices that extend the basic check matrix via a square matrix [4]. The basic check matrix of the LDPC code is denoted as H b of dimension m b × n b . The element in H b corresponds to the submatrix of the check matrix H ( H could be obtained only by updating and extending the elements in H b ). The column number of H b is fixed to 24, i.e., n b = 24 . For the standard LDPC code with a 2/3 code rate, the row number of H b is eight, i.e., m b = 8 . The basic check matrix H b of the LDPC code in the standard form is:
H b = H b 1 H b 2
In Equation (1), H b 1 size is m b × k b ( k b = n b m b ) and H b 2 is m b × m b [8].
In submatrices H b 1 and H b 2 of H b , the element values are −1 or a non-negative integer. In the updating and extension of H b towards H , the elements with a −1 value are replaced by z × z zero matrices, and the elements with non-negative integer values are replaced by the square matrix after the cyclic shift of a z × z unit matrix [8]. z is the extension factor z = n / n b , n denotes the length of the LDPC code, and n b is the column length of H b . The submatrix H b 2 has the following structural characteristics:
In the first column of H b 2 , the values of h 1 , h r , and h m b are non-negative integers, h ( 1 ) = h ( m b ) , and r [ 2 , m b 1 ] . Except for the first row, tail row, and row r , the element values at the other positions are −1. Apart from the first column, the other parts constitute a quasi-double-diagonal structure (Equation (2)). The basic check matrix of the LDPC code with a code rate of 2/3 in the standard form is displayed in Equation (3).
H b 2 = h ( 1 ) 0 1 0 0 0 0 1 1 0 h ( r ) 1 0 1 0 0 1 0 0 h ( m b ) 0
H b = 3 0 1 1 2 0 1 3 7 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 36 1 1 34 10 1 1 18 2 1 3 0 1 0 0 1 1 1 1 1 1 1 12 2 1 15 1 40 1 3 1 15 1 2 13 1 1 1 0 0 1 1 1 1 1 1 19 24 1 3 0 1 6 1 17 1 1 1 8 39 1 1 1 0 0 1 1 1 20 1 6 1 1 10 29 1 1 28 1 14 1 38 1 1 0 1 1 1 0 0 1 1 1 1 10 1 28 20 1 1 8 1 36 1 9 1 21 45 1 1 1 1 1 0 0 1 35 25 1 37 1 21 1 1 5 1 1 0 1 4 20 1 1 1 1 1 1 1 0 0 1 6 6 1 1 1 4 1 14 30 1 3 36 1 14 1 1 1 1 1 1 1 1 0

3. LDPC Coding Algorithm

3.1. Standard Coding Algorithm

The check matrix of the LDPC code is set as H and has a dimension of m × n . The information bit length, the information codeword (CW) vector, and the post-coding CW vector are k , u , and c , respectively [5,6,7,8,9]. After Gaussian elimination, the check matrix H is presented as:
H = P m × k I m × m
The following generator matrix can be obtained according to the check constraint equation G H T = 0 :
G = I k × k P T k × m
When u is known and c can be obtained through c = u G , thus completing the coding process of the input information.
The code rate is set as R = k / n , and the following can be obtained when one frame of code is generated:
The number of multiplication operations is:
k n = R n 2
The number of additive operations is:
( k 1 ) n = R ( n 1 / ( 2 R ) ) 2 1 / ( 4 R )
The computational complexity of the standard coding algorithm is O ( n 2 ) . When the information code involves tens of thousands of bits, the coding complexity will significantly affect implementation. Nevertheless, standard LDPC coding has been rarely used [9,10,11,12,13].

3.2. Recursive-Iterative Coding Algorithm

The recursive-iterative coding algorithm uses a quasi-double-diagonal structure of the check matrix, simplifying the coding operations to a certain extent [5,14,15,16,17]. The check bit CW generated in the coding process is set as m , the check vector as p , the post-coding CW length as n , and the CW vector as c . Z q is a z × z square matrix, selected as a zero matrix ( q = −1) or a square matrix ( q is a non-negative integer) after the cyclic rightward shift of a unit matrix. The algorithm flow is described as follows:
u and p are segmented according to the length z :
u = [ u 1 T u 2 T u k b T ]
p = [ p 1 T p 2 T p m b T ]
In Equations (6) and (7):
u i = [ u ( ( i 1 ) z + 1 ) u ( ( i 1 ) z + 2 u ( i z ) ] T , i = 1 , 2 , , k b
p i = [ p ( ( i 1 ) z + 1 ) p ( ( i 1 ) z + 2 p ( i z ) ] T , i = 1 , 2 , , m b
② All u i and p i are longitudinally spliced to obtain:
[ u 1 u 2 u k b ] T = u T
[ p 1 p 2 p m b ] T = p T
Each sub-vector p i of the check vector p is obtained through the coding process, after which the following are acquired:
c = [ u p ]
H 2 p T = H 1 u T is obtained according to the check equation H c T = 0 . The element located at position ( i , j ) in the sub-matrix H b 1 of the basic check matrix H b is denoted as H b 1 ( i , j ) . According to the forms of Z q and H b 2 , the following are calculated:
Z h ( 1 ) Z 0 Z 1 Z 0 Z 0 Z 0 Z 0 Z 1 Z 1 Z 0 Z h ( r ) Z 1 Z 0 Z 1 Z 0 Z 0 Z 1 Z 0 Z 0 Z h ( m b ) Z 0 P 1 P 2 P m b = Z H b 1 ( 1 , 1 ) Z H b 1 ( 1 , 2 ) Z H b 1 ( 1 , k b ) Z H b 1 ( 2 , 1 ) Z H b 1 ( 2 , 2 ) Z H b 1 ( 2 , k b ) Z H b 1 ( m b , 1 ) Z H b 1 ( m b , 2 ) Z H b 1 ( m b , k b ) u 1 u 2 u k b
First, the equation set of Equation (13) is unfolded to obtain m b equations, which are added to acquire:
p 1 = ( Z h ( 1 ) + Z h ( r ) + Z h ( m b ) ) 1 i = 1 m b j = 1 k b Z H b 1 ( i , j ) u j
p1 is then substituted into Equation (13), and from the first equation ( Z h ( 1 ) P 1 + Z 0 P 2 = j = 1 k b Z H b 1 ( 1 , j ) u j ) we obtain:
p 2 = j = 1 k b Z H b 1 ( 1 , j ) u j + Z h ( 1 ) p 1
p2 is then substituted into Equation (13), and from the second equation ( Z 0 P 2 + Z 0 P 3 = j = 1 k b Z H b 1 ( 2 , j ) u j ) we calculate p 3 . The remaining ones can be similarly calculated with the iterative formula of p i obtained as:
p i = p i 1 + j = 1 k b Z H b 1 ( i 1 , j ) u j , i = 3 , 4 , , r , r + 2 , r + 3 , , m b
p r + 1 = p r + j = 1 k b Z H b 1 ( r , j ) u j + Z h ( r ) p 1
The iterative formula of each sub-vector p i in the check vector p is obtained through Equations (16) and (17), thus finally obtaining:
p = p 1 T p 2 T p m b T
After p is obtained, and since the input u is known, we calculate c from Equation (12), thereby completing the coding operation.

3.3. High-Efficiency and Low-Complexity Coding Algorithm (HE-LC)

When the standard coding algorithm generates CWs via a generator matrix, the deficiency lies in the check matrix’s sparsity, which is affected during the Gaussian elimination process. Consequently, the coding complexity is directly proportional to the square of the code length, consuming enormous hardware resources and imposing an excessive time delay [12,13,17,18]. Nevertheless, employing the recursive-iterative coding algorithm allows for the check vector’s iterative formula to be obtained through the check matrix’s quasi-double-diagonal structure, which aggravates the computational complexity of the check matrix processing by simplifying the coding process.
Based on the research and analysis of the check matrix, we developed a high-efficiency and low-complexity coding algorithm. Specifically, through the row-column replacement, an approximate upper triangular structure (Figure 1) appears in the middle submodule of the check matrix. Next, we partitioned the check matrix so that the upper triangular matrix of the middle submodule forms an independent submatrix. Thereby, efficient iterative coding is evident for this submatrix’s special structure, guaranteeing through preprocessing the check matrix’s sparsity and enabling iterative coding.
In Figure 1, A is an ( m g ) × k matrix, T is an ( m g ) × ( m g ) upper triangular matrix, B is an ( m g ) × g matrix, C is a g × k matrix, E is a g × ( m g ) matrix, and D is a g × g matrix. The basic check matrix is given as:
H b = A ( m b g ) × k b T ( m b g ) × ( m b g ) B ( m b g ) × g C g × k b E g × ( m b g ) D g × g
In the basic check matrix H b with a standard code rate of 2/3, we set n b = 24 , m b = 8 , k b = 16 , and g = 1 .
The post-transformation matrix is obtained by left multiplicating the basic check matrix H b with a full-rank square matrix I m g   0 E T 1 I g :
H ^ b = A T B E T 1 A + C 0 E T 1 B + D
In Equation (20), I m g and I g represent the unit matrices of size ( m g ) × ( m g ) and g × g , respectively.
The CW vector is set to c = [ u p 1 p 2 ] , where u is an information bit, and p 1 and p 2 are check bits. The following are obtained through the check equation H b c T = 0 H ^ b c T = 0 :
A u T + T p 1 T + B p 2 T = 0 ( E T 1 A + C ) u T + ( E T 1 B + D ) p 2 T = 0
Namely:
p 2 T = φ 1 ( E T 1 A + C ) u T
p 1 T = T 1 ( A u T + B p 2 T )
where φ = E T 1 B + D is a g × g full-rank square matrix (to be replaced by a z × z square matrix).
The structural features of the φ matrix are studied and analyzed, followed by MATLAB-based simulation, revealing that the φ matrix preserved a unit matrix within a binary field. For the LDPC code utilizing a standard code rate of 2/3, the MATLAB simulation results of the φ matrix are illustrated in Figure 2.
Through comprehensive analysis, the following are obtained:
p 2 T = ( E T 1 A + C ) u T
p 1 T = T 1 ( A u T + B p 2 T )
Adopting the high-efficiency and low-complexity coding algorithm in the coding process, the computational complexity of p 2 and p 1 is o ( n + g 2 ) and o ( n ) , respectively. Hence, the proposed algorithm mitigates the implementation complexity while achieving an efficient coding performance. Moreover, it reduces the coder’s hardware needs, facilitating hardware implementation. The implementation flow of the high-efficiency and low-complexity coding algorithm is presented in Table 1.
In the coding algorithm process, the maximum computational complexity lies in the first step f 1 = A u T , where the matrix multiplication is solved by performing a cyclic shift operation on the u T matrix. This results in a linear relationship between the computational complexity and the code length. Using the same method, f 2 and f 5 can be obtained.
For an identity matrix E n × n and any column vector A n × 1 , if we cycle up A n × 1 bit of column vector q to obtain B n × 1 , and cycle right E n × n bit of column vector q to obtain E n × n , then E n × n A n × 1 = B n × 1 .
For the calculation of step 2 f 3 = T 1 f 1 and step 6 p 2 = T 1 f 6 , if the inverse matrix method is used to solve, it will have a high computational complexity, and it is highly likely to damage the sparsity of the matrix, which is not conducive to hardware implementation. For binary encoding, the forward permutation method can be used to solve this type of operation. In the implementation process, the calculation of f 3 = T 1 f 1 and p 2 = T 1 f 6 can be simply achieved using XOR gates. At the same time, the calculation of f 6 = f 1 + f 5 and p 1 = f 4 + f 2 can also be achieved using XOR gates, and, finally, the complete codeword can be obtained by combining the information sequence and verification sequence in an integer order.
The computational complexity analysis of check bits p 1 and p 2 in efficient and low-complexity encoding algorithms is shown in Table 2 and Table 3. From the observation and analysis of Table 2 and Table 3, it can be seen that the computational complexity of the improved, efficient, and low-complexity encoding algorithm exhibits a linear relationship with the size of the code-length value.

4. LDPC Decoding Algorithm

4.1. Log-Likelihood Ratio-Belief Propagation (LLR–BP) Algorithm

The LDPC soft-decision decoding algorithm with the BP algorithm serving as the basic algorithm affords a short operation time, high resource utilization rate, and high decoding throughput rate. The iterative message of the BP algorithm is generally expressed in the form of probability or LLR. The advantage of the LLR–BP algorithm over the probability-based BP algorithm lies in transforming many multiplication operations into additive operations, thus considerably relieving the implementation complexity of decoding while preserving the decoding performance [5,18,19,20,21,22,23,24,25,26,27,28,29].
This paper introduces the min-sum decoding algorithm (MSA), based on the LLR–BP algorithm, and combines it with the high-efficiency and low-complexity coding algorithm to complete the design and implementation of a coder-decoder (codec).
The input information of the decoder after AWGN channel transmission was set as y i = ( y 1 , y 2 , , y n ) and y i = x i + n i , where x i ( i = 1 , , n ) represents a symbol sequence, with the CW c i having a value of {0, 1} that is mapped into {+1,−1}. n i denotes the AWGB with a mean value of 0 and variance of σ 2 .
In the k -th decoding iteration, λ n k denotes the posterior LLR of the input data n -th bit, λ m n k is the information transmitted from n to m , and Λ m n k is the information transmitted from m to n . The implementation flow of the LLR-BP algorithm is presented below:
① Initiation: n 1 , 2 , , N , λ m n 0 = λ n 0 ,   m M n , and λ m n 0 = λ n 0 ,   m M n are initialized. The initial LLR received by the decoder after channel transmission is expressed as:
λ n 0 = log p x i = 0 | y i p x i = 1 | y i = 2 y i σ 2
In Equation (26), λ n ( 0 ) is the initial value of n , M n represents the set of all m connected to n , and N ( m ) is the set of all n connected to m .
② Updating the check nodes: For each n and n N ( m ) , the following equation is calculated:
Λ m n k = Π n N m \ n s i g n λ m n k 1 Φ 1 Σ n N m \ n Φ λ m n k 1
In Equation (27), the functions s i g n ( x ) and Φ x are defined as:
s i g n ( x ) = + 1 x 0 1 x < 0
Φ x = Φ 1 x = lg e x + 1 e x 1 x > 0
where N ( m ) \ n denotes that the set of all variable nodes after n is deleted from the set N ( m ) , and M ( n ) \ m denotes that the set of all check nodes after m is deleted from the set M ( n ) .
③ Updating the variable nodes: For each m M ( n ) we calculate:
λ m n k = λ n 0 + Σ m M ( n ) \ m Λ m n
The posterior LLR of each variable node is:
λ n ( k ) = λ n ( 0 ) + Σ m M ( n ) Λ m n ( k )
④ Decoding decision:
x ^ n ( k ) = 0 λ n ( k ) 0 1 λ n ( k ) < 0
According to the check formula, the corrector of the CW x ^ ( k ) obtained through decoding is:
S = H ( x ^ ( k ) ) T
If S = 0 , the decoding has succeeded and the process ends by outputting the decoded CW x ^ ( k ) as an effective value. If S 0 , steps ②, ③, and ④ are repeated until the preset maximum number of iterations is reached.
For the LLR–BP algorithm, the nonlinear function Φ ( x ) is implemented through a lookup table during the hardware implementation process. The quantification of Φ ( x ) directly impacts the decoder’s functional implementation and resource consumption.

4.2. Min-Sum Algorithm(MSA)

The MSA algorithm, a simplified form of the LLR-BP algorithm, performs the following simplification during the check nodes update [22,23,24,25,26,27,28,29,30,31,32]:
Λ m n k = Π n N m \ n s i g n λ m n k 1 min n N m \ n λ m n k 1
The MSA algorithm detours the implementation of function Φ ( x ) by solving the minimum value and performing the additive operation, which, to some extent, reduces the implementation complexity of decoding. This paper uses the MSA algorithm as the decoding implementation algorithm, with its implementation flow depicted in Figure 3.

5. Design and Implementation of the Codec

5.1. Performance Analysis of Codec Algorithm

In the subsequent trials, we combined several coding algorithms with the MSA algorithm to design the (1536, 1024) LDPC codec uniformly quantized on (6, 2), involving eight iterations and a 2/3 code rate in the IEEE802.16e standard. In any case, we considered BPSK modulation and AWGN channel transmission. The corresponding codec performance simulation results are illustrated in Figure 4.
At a bit error rate (BER) of 10 5 , the codec’s BER obtained by the high-efficiency and low-complexity coding algorithm is about 0.25 dB lower than the recursive-iterative coding algorithm and about 1.25 dB lower than the standard coding algorithm. The simulation results presented in Figure 3 conclude that the LDPC codec implemented by the high-efficiency and low-complexity coding algorithm is characterized by high coding performance and low complexity. As a reminder, this paper designs and implements the high-efficiency and low-complexity coding algorithm used by the LDPC coder, whereas the decoder employs the MSA algorithm.

5.2. Coder Design and Implementation

5.2.1. Coder Design

The coder structure (Figure 5) designed and implemented using the high-efficiency and low-complexity coding algorithm comprises an input/output random access memory (RAM) module, vector adder (VA) module, matrix-vector multiplier (MVM) module, CW generator (CWG) module, cache module, forwarder replacer (FS) module, and control module [28,29,30,31,32,33,34,35].
Input/output RAM module
During the encoding process, in order to continuously process the data and reduce the time complexity of operations, the ping-pong pipeline method shown in Figure 6 is used to store the data. The input RAM and output RAM are both dual-port RAM. By using the ping-pong operation, not only can data be seamlessly and continuously processed, but storage space can also be saved, and hardware implementation is easy.
Matrix Multiplier (MVM) Module
The multiplier in the encoder is executed in parallel, consisting of a cyclic shifter and a modulo two adder. Its structural principle diagram is shown in Figure 7. The high 4 bits of the register store the row number of the check matrix, the middle 4 bits store the column number of the check matrix, and the low 7 bits store the circularly shifted value. By performing a cyclic shift operation on the vectors in the corresponding information bits, a non-zero element and information bit multiplication operation can be completed. Then, the operation of multiplying the entire row of the matrix by the information bits can be performed, that is, multiplying all non-zero elements in this row by the information bits. Finally, performing the modulo 2 addition operation can complete the multiplication operation of the matrix.
In the design and implementation process of the encoder, how to effectively store the information of the verification matrix and how to effectively reduce the storage amount are the core issues of the research topic. From Figure 7, it can be seen that the elements of each row in the base check matrix are stored as 15 bits in a fixed row, which not only saves a lot of storage resources but also facilitates hardware implementation. For the LDPC code studied in the project, using traditional methods to store the verification matrix requires 4.42 × 10 5 bits resource space, whereas using the above methods only requires 8.38 × 10 3 bits resource space.
Forward Displacer (FS) Module
In the design process of the encoder, the forward displacement method was used to solve f 3 = T 1 f 1 , and the calculation process can be completed using XOR operation. It has the characteristics of low computational complexity, low resource consumption, and easy hardware implementation. The calculation process is as follows:
Set up f 1 = ( v 1 , v 2 , , v m b 1 ) , f 3 = ( c 1 , c 2 , , c m b 1 ) , There are:
f 3 = T 1 f 1 T f 3 = f 1 c 1 = v 1 c 1 + c 2 = v 2 c 2 + c 3 = v 3 c m b 2 + c m b 1 = v m b 1 c 1 = v 1 c 2 = v 2 + v 1 c 3 = v 3 + v 2 c m b 1 = v m b 1 + v 3 + v 2 + v 1
The computational process was executed in parallel, which not only reduces computational complexity but also minimizes computational latency. When solving p 2 = T 1 f 6 , the same method can be used for calculation.
Code word generator (CWG) module
The function of the codeword generator module is to combine the calculated check bits p 1 and p 2 with existing information bits into a complete codeword, and then store it in the output RAM according to the synthesis rule u , p 1 , p 2 of the codeword. The schematic diagram is shown in Figure 8.
Other modules
In addition to the modules discussed above, the encoder consists of a vector adder module (VA), a control module (Control), and a cache module (Cache). The function of the VA module is to perform addition operations between vectors; the function of the control module is to generate the control signals required during the encoding process; and the function of the cache module is to coordinate with the next step of data synchronization.

5.2.2. Coder Implementation

The correctness and feasibility of the coding algorithm and design flow are first verified via MATLAB simulation during the coder implementation process. Then, each module is designed and implemented in the Vivado integrated environment according to the coder’s functional block diagram, followed by the hardware design for this coder. Finally, the functional simulation and board-level test are conducted for the designed and implemented coder, thus completing the coder implementation that meets the demands and verifies the effectiveness and feasibility of the research design.
MATLAB simulation results
In order to verify the correctness and feasibility of the designed encoder, the M files of each module in the encoding process are first written in MATLAB. Then, a string of 1024 bit binary metadata information is input in MATLAB for encoding processing. After the encoding is completed, the 1536 bits data output is compared and analyzed with theoretical values to determine the correctness of the design.
Figure 9 shows the simulation diagram of the input and output data of the designed encoder during the simulation verification process. Through the diagram, the input information of the encoder, the verification information generated by the encoder, and the output codeword information after encoding can be seen. By comparing and analyzing the software simulation results and theoretical calculation results, it was found that the two are completely identical, which further verifies the correctness and feasibility of the encoder designed in this article.
FPGA Implementation Results
The correctness and feasibility of the encoder design process were verified through MATLAB simulation implementation. On the basis of software simulation implementation, first, according to the encoder implementation schematic, we performed Verilog HDL programming on each module in Vivado2018.2; then, we called each module through the top-level file to complete the hardware design of the encoder; and finally, through functional simulation and board level testing, the implementation of an encoder that meets the performance requirements of the project was completed.
The performance indicators during the encoder testing and verification process are as follows: the input data transmission rate was 2 Mbps and the system clock frequency was 150 MHz (the highest operating frequency of the system is 250 MHz). The functional simulation diagram during the hardware design and implementation process of the LDPC encoder is shown in Figure 10 and Figure 11 and Table 4.
Figure 10 is the RTL-level simulation implementation diagram of the encoder. In order to simulate and verify the designed and implemented encoder, a signal source module was added during the encoder design process to generate input data for the encoder verification testing process. Table 4 shows the resource usage report of the encoder. Through observation and analysis, it can be seen that the encoder implementation process requires 10752 LUTs, accounting for 20.00% of the total quantity, and 12658 Registers, accounting for 12.00% of the total quantity. Figure 11 shows the simulation test results of the encoder, which is designed using an efficient and low-complexity encoding algorithm. The system clock frequency clk is 150 MHz, where u represents the 1024 bit data input by the encoder, p 1 and p 2 represent the 64 bit and 448 bit verification data generated by the encoder, and u 1 represents the 1536 bit data output after encoding. According to the observation and analysis in Figure 11, it can be seen that the encoder takes 5632 clock cycles to complete the encoding of a frame of data. During the encoding process, using the ping-pong pipeline operation mode can save a lot of time and improve the encoding speed. The input and output data of the encoder are consistent with the MATLAB simulation results, verifying the correctness and feasibility of the encoder hardware design.

5.3. Decoder Design and Implementation

5.3.1. Decoder Design

The decoding process is conducted by MSA algorithm and corresponding serial implementation structure is shown in Figure 12. For detail, the serial structure of the MSA algorithm comprises a variable node processor (VNP), check node processor (CNP), intermediate message RAM, control unit, and all types of data storage devices.
Variable Node Processor (VNP)
The variable node processor has two functions: one is to input and output data information, and the other is to update the information of variable nodes. During the initialization phase, due to the fact that all data stored in Message RAM is 0, after the first sum operation, the input data is sequentially the corresponding channel information. VNP completes the update function of variable node information according to Equation (30).
Verify Node Processor (CNP)
The function of the verification node processor is to update the information of the verification node, and the CNP completes the update function of the verification node information according to Equation (34).
Message RAM
The intermediate information storage is a dual-port RAM, with one port read-only and the other port write-only. VNP and CNP sequentially read and write Message RAM through the control of enable signals. The capacity of Message RAM is set to 24576 bits, which means that the bit width of RAM is 4 and the depth is 6144.
Control Unit
The control unit completes the control functions of the decoding system, including controlling the number of iterations, controlling the working state of nodes, and controlling the output of decoding end code words.
Interleaver
During the implementation of the decoder, only one Message RAM is used to store data. When storing VNP and CNP data, an interleaver is needed to interleave and reorder the CNP data to improve the error performance of the decoder. The data relationships in the interleaver are uniquely determined by the check matrix.
Other module units
Initial data storage (Src-Mem): This is used to store data to be decoded, with a data width of 4 bits and a depth of 6144.Decoding Result Memory (Result-Mem): This is used to store the data output after decoding, with a data width of 4 bits and a depth of 6144.

5.3.2. Decoder Implementation

In the decoder implementation process, the correctness and feasibility of the decoding algorithm and design flow completely adopt the decoder’s design implementation and evaluation, meeting the demands and verifying the correctness and feasibility of the research design.
MATLAB simulation results
In order to verify the correctness and feasibility of the decoder design, the M files of each component module of the decoder design were first written on MATLAB, and then the M files of the entire decoder were simulated. Through the research and analysis of the simulation waveform, the correctness of the designed decoder was determined. In the MATLAB simulation process, the input of the decoder is the data information transmitted through the channel after encoding and modulation. During the simulation process, the channel is set as an additive Gaussian white noise channel, and the signal-to-noise ratio is set to 3 dB. The MATLAB simulation waveform is shown in Figure 13 and Figure 14.
FPGA Implementation Results
The correctness and feasibility of the decoder design process were verified through MATLAB simulation implementation. On the basis of software simulation implementation, first, according to the decoder implementation schematic, we performed Verilog HDL programming on each module in Vivado2018.2; then, we called each module through the top-level file to complete the hardware design of the decoder; and, finally, through functional simulation and board level testing, the implementation of a decoder that meets the performance requirements of the project was completed.
The performance indicators during the decoder testing and verification process were as follows: the input data transmission rate was 2 Mbps and the system clock frequency was 150 MHz (the highest operating frequency of the system is 250 MHz). The functional simulation diagram during the hardware design and implementation process of the LDPC decoder is shown in Figure 15 and Figure 16 and Table 5.
Figure 16 shows the RTL-level simulation results of the encoder and decoder, and Table 5 shows the resource usage report of the bit encoder and decoder.
Figure 16 shows the simulation test results of the decoder. The decoder was designed using the minimum sum decoding algorithm, with 8 decoding iterations and (4,2) uniform quantization method. The system clock frequency clk was 150 MHz—over is the end of decoding processing flag, and a high level indicates the end of decoding processing; Cnp_ The on enable signal determines whether the next state is variable node update processing or verification node update processing, vnp_Finish is the identification signal for the completion of variable node update processing, cnp_Finish is the completion signal of the verification node update processing, last_Iteration marks the signal for the last iteration, indicating the last iteration cycle, and prompts the variable node processor to output the decoding result after, processing_Num is the number of decoding iterations, and the maximum number of iterations for the designed decoder is 8; AP_Rst_N is the system reset signal, with high-level reset; AP_Done outputs the enable signal for the decoding result, and llr_ V_TDATA [7:0] is the 1024 bit initialization data input to the decoder. The number of iterations during the decoder design and implementation process is set to 8, and the output is_R_TDATA [7:0] is the 1536 bit decoding data output after decoding is completed. From research on and analysis of the LDPC code verification matrix used, combined with the decoding simulation results of the decoder, it can be seen that the decoder needs to take 5632 clock cycles to complete the decoding process of a frame of data. During the decoding implementation process, each variable node and verification node need to be calculated one by one. The input and output data of the decoder are consistent with the MATLAB simulation results, verifying the correctness and feasibility of the decoder hardware design.

6. Conclusions

LDPC codes are highly regarded as linear block codes due to their exceptional coding performance, low decoding complexity, flexible structure, and easy hardware implementation. Consequently, they have become a prominent research topic in the field of channel coding. To address the challenge of high computational complexity in implementing standard encoding algorithms, this study proposes an efficient and low-complexity encoding algorithm through extensive research and analysis of check matrices, building upon the investigation of standard encoding algorithms and recursive iterative encoding algorithms.
By combining the LLR-BP algorithm with the minimum sum decoding algorithm, we successfully implemented an (1536, 1024) LDPC encoder and decoder with a 2/3 code rate using (6, 2) uniform quantization. The iteration count was set at 8, adhering to the IEEE 802.16e standard. This implementation was tested through software simulation on MATLAB and hardware testing on the Xilinx Zynq7020 FPGA platform. The implementation results obtained from both software simulation and hardware testing demonstrate that when utilizing the minimum sum decoding algorithm in conjunction with BPSK modulation and AWGN channel transmission, the proposed efficient and low-complexity encoding algorithm achieves approximately a 0.25 dB improvement in encoding performance compared to the recursive iterative encoding algorithm. Moreover, it achieves around a 1.25 dB improvement compared to the standard encoding algorithm. These findings further validate the correctness, effectiveness, and feasibility of the proposed algorithm. The experimental results highlight that employing efficient and low-complexity encoding algorithms not only reduces computational complexity and logical delays during the encoder implementation process, but also enhances encoding performance and data transmission reliability. This bears significant theoretical and practical research implications for advancing the widespread application and rapid development of LDPC codes in the realm of digital communication.

Author Contributions

Methodology, X.L., J.G. and Z.L.; Software, Y.X.; Data curation, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant no. 61703060, the National Engineering Research Center for Oil & Gas Drilling Equipment under grant no. 202307, the Sichuan Science and Technology Plan Project (grant nos. 2020YFS0507, 2021YFG0361, 2021YFS0311, 2023YFS0426).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Liao, X.; Yu, N.; Luo, Z.; Ren, X.; Fang, A. Research and Design of LDPC Codes Based on MATLAB/Simulink. J. Chengdu Univ. (Nat. Sci. Ed.) 2017, 36, 272–275+303. [Google Scholar]
  2. Gallager, R.G. Low Density Parity Check Codes. IRE Trans. Inf. Theory 1962, 8, 2018–2220. [Google Scholar] [CrossRef]
  3. MacKay, D.J.; Neal, R.M. Near Shannon Limit Performance of Low Density Parity Check Codes. Electron. Lett. 1996, 32, 1645–1646. [Google Scholar] [CrossRef]
  4. Xu, R. Design of LDPC Decoder for CCSDS Deep Space Communication Standard; Xi’an University of Technology: Xi’an, China, 2010; pp. 17–18. [Google Scholar]
  5. Yang, X. Turbo and LDPC Codec and Their Applications; People’s Posts and Telecommunications Publishing House: Beijing, China, 2010. [Google Scholar]
  6. Wang, M. Design and FPGA Implementation of High Speed LDPC Codec; University of Electronic Science and Technology: Chengdu, China, 2016. [Google Scholar]
  7. Guo, H. Research on Implementation of LDPC Codes Based on IEEE802.16e Standard; Harbin Institute of Technology: Harbin, China, 2010. [Google Scholar]
  8. Fan, K. Research and Implementation of LDPC Coding and Decoding Technology Based on IEEE802.16e; Xi’an University of Electronic Science and Technology: Xi’an, China, 2009. [Google Scholar]
  9. Wu, Z.; Zhang, L.; Zhong, Z.; Liu, R. Reconstruction of LDPC Sparse Check Matrix under High Bit Error Rate. J. Commun. 2021, 42, 1–10. [Google Scholar]
  10. Du, G. An overview of the principle and application of LDPC codes. China New Commun. 2012, 14, 25–33. [Google Scholar]
  11. Li, P.; Qi, F.; He, D.; Li, J. Design of IEEE 802.16e standard LDPC encoder based on FPGA. Mod. Navig. 2022, 13, 212–217+222. [Google Scholar]
  12. Guodong, W.A.; Jinming, L.I.; Zhiwang, Z.H.; Denghui, T.I. Design and Implementation of LDPC Encoder Based on FPGA. J. Meas. Sci. Instrum. 2021, 12, 12–19. [Google Scholar]
  13. Xue, W.; Yu, H.; Wang, J.; Shu, F. Optimization of High Efficiency LDPC Decoder and Implementation of FPGA. Data Acquis. Process. 2018, 33, 1101–1111. [Google Scholar]
  14. Sun, N. Research on Parity Check Matrix Construction and Decoding Optimization Algorithm of LDPC Codes; Shandong University: Jinan, China, 2019. [Google Scholar]
  15. Liao, P. Research and Design Implementation of LDPC Code High Speed Decoder in Deep Space Communication; Yanshan University: Qinghuangdao, China, 2022. [Google Scholar] [CrossRef]
  16. Shi, S.; Wang, R.; Li, H.; Han, C. Implementation of Multipath Parallel Encoder for LDPC Code. J. Electron. Meas. Instrum. 2021, 35, 83–89. [Google Scholar] [CrossRef]
  17. Shao, B. Research and Implementation of LDPC Code in 5G Communication System; Xi’an University of Electronic Science and Technology: Xi’an, China, 2022. [Google Scholar] [CrossRef]
  18. Richardson, T.J.; Urbanke, R.L. Efficient encoding of low-density parity-check codes. IEEE Trans. Inf. Theory 2001, 47, 638–655. [Google Scholar] [CrossRef]
  19. Lee, J.H.; Sunwoo, M.H. Low-Complexity High-Throughput Bit-Wise LDPC Decoder. J. Signal Process. Syst. 2019, 91, 855–862. [Google Scholar] [CrossRef]
  20. Zhang, C.; Su, K. Practice of Digital Signal Processing and Engineering Application of FPGA; China Railway Press: Beijing, China, 2013. [Google Scholar]
  21. Guo, L.; Chen, H. LDPC coding and decoding method based on FPGA for IEEE 802.16e. Autom. Technol. Appl. 2017, 36, 49–53. [Google Scholar]
  22. Shan, B.; Li, Z. Design and performance analysis of improved LDPC decoding scheme. Comput. Eng. Des. 2019, 40, 1507–1511. [Google Scholar]
  23. Chen, F. Low Complexity Deep Learning LDPC Decoding; Central South University for Nationalities: Wuhan, China, 2021. [Google Scholar] [CrossRef]
  24. Yang, H. Research and Implementation of LDPC Decoder in Satellite Communication; Xi’an University of Electronic Science and Technology: Xi’an, China, 2023. [Google Scholar] [CrossRef]
  25. Luo, X. Research on Hybrid Decoding Algorithms for LDPC Codes; University of Electronic Science and Technology: Chengdu, China, 2022. [Google Scholar] [CrossRef]
  26. Wang, D. Improvement of Decoding Algorithm Based on LDPC Code and FPGA Implementation; Nanjing University of Information Engineering: Nanjing, China, 2021. [Google Scholar] [CrossRef]
  27. Wang, L.; Li, J. Design and Implementation of LDPC Decoder Based on FPGA. Electron. Meas. Technol. 2022, 45, 22–27. [Google Scholar] [CrossRef]
  28. Li, J.; Chen, B. FPGA Implementation of QC-LDPC Decoder Based on Minimum Sum Algorithm. Appl. Sci. Technol. 2020, 47, 35–40. [Google Scholar]
  29. Chen, F.; Liu, Y.; Tang, C. A Low Complexity Normalized Minimum Sum Decoding Algorithm for LDPC Codes. J. Chongqing Univ. Posts Telecommun. (Nat. Sci. Ed.) 2020, 32, 92–98. [Google Scholar]
  30. Sun, J.; Li, J. LDPC Minimum Sum Decoding Algorithm and Its IC Physical Design. J. Meas. Sci. Instrum. 2023, 14, 108–115. [Google Scholar]
  31. Li, J.; Zhang, P.; Wang, L.; Wang, G. An FPGA LDPC decoder for optimizing scaling factors in NMS decoding algorithms. J. Meas. Sci. Instrum. 2022, 13, 398–406. [Google Scholar]
  32. Yang, P.; Jun, B.; No, J.S.; Park, H. A new two-stage decodingscheme with unreliable path search to lower the error-floor for low-density parity-check codes. IET Commun. 2017, 11, 2173–2180. [Google Scholar] [CrossRef]
  33. Han, X. Design and Optimization of LDPC Codec in High Speed WLAN System; Beijing University of Posts and Telecommunications: Beijing, China, 2014. [Google Scholar]
  34. Wang, H.; Guo, D. Design of LDPC decoder based on FPGA. J. Lul. Univ. 2019, 9, 34–40. [Google Scholar]
  35. Gu, S.; Luo, Z.; Chu, Y.; Xu, Y.; Guo, J. A Suboptimal Optimizing Strategy for Velocity Vector Estimation in Single-Observer Passive Localization. Sensors 2023, 23, 5940. [Google Scholar] [CrossRef]
Figure 1. Approximate upper triangular structural form of a check matrix.
Figure 1. Approximate upper triangular structural form of a check matrix.
Electronics 12 03696 g001
Figure 2. Simulation diagram of the φ matrix generation.
Figure 2. Simulation diagram of the φ matrix generation.
Electronics 12 03696 g002
Figure 3. Implementation flow of MSA algorithm.
Figure 3. Implementation flow of MSA algorithm.
Electronics 12 03696 g003
Figure 4. Performance simulation diagram of different coding algorithms.
Figure 4. Performance simulation diagram of different coding algorithms.
Electronics 12 03696 g004
Figure 5. Functional block diagram of coder implementation.
Figure 5. Functional block diagram of coder implementation.
Electronics 12 03696 g005
Figure 6. Principle diagram of Table Tennis Pipeline Mode.
Figure 6. Principle diagram of Table Tennis Pipeline Mode.
Electronics 12 03696 g006
Figure 7. Schematic diagram of matrix multiplier.
Figure 7. Schematic diagram of matrix multiplier.
Electronics 12 03696 g007
Figure 8. Principle diagram of codeword generator.
Figure 8. Principle diagram of codeword generator.
Electronics 12 03696 g008
Figure 9. The coder’s MATLAB simulation results.
Figure 9. The coder’s MATLAB simulation results.
Electronics 12 03696 g009
Figure 10. RTL-level simulation diagram of LDPC encoder.
Figure 10. RTL-level simulation diagram of LDPC encoder.
Electronics 12 03696 g010
Figure 11. The coder’s hardware test results. The solid boxes represent The data information of the check bit p 2 generated during the encoding process.
Figure 11. The coder’s hardware test results. The solid boxes represent The data information of the check bit p 2 generated during the encoding process.
Electronics 12 03696 g011
Figure 12. Functional block diagram of the decoder designed through the serial structure of the MSA algorithm.
Figure 12. Functional block diagram of the decoder designed through the serial structure of the MSA algorithm.
Electronics 12 03696 g012
Figure 13. MATLAB simulation results of decoder.
Figure 13. MATLAB simulation results of decoder.
Electronics 12 03696 g013
Figure 14. MATLAB simulation results of codec.
Figure 14. MATLAB simulation results of codec.
Electronics 12 03696 g014
Figure 15. RTL-level simulation results of coder and decoder.
Figure 15. RTL-level simulation results of coder and decoder.
Electronics 12 03696 g015
Figure 16. Hardware test results of codec.
Figure 16. Hardware test results of codec.
Electronics 12 03696 g016
Table 1. Implementation flow of the high-efficiency and low-complexity coding algorithm.
Table 1. Implementation flow of the high-efficiency and low-complexity coding algorithm.
StepComputational Formula
Step ICalculate: f 1 = A u T , f 2 = C u T
Step IICalculate: f 3 = T 1 f 1 , f 4 = E f 3
Step IIICalculate: p 2 = f 4 + f 2
Step IVCalculate: f 5 = B p 2 T
Step VCalculate: f 6 = f 1 + f 5
Step VICalculate: p 1 = T 1 f 6
Table 2. Operational complexity analysis table of p 1 .
Table 2. Operational complexity analysis table of p 1 .
Operating StepsComplexity
A u T O ( n )
C u T O ( n )
T 1 A u T O ( n )
E T 1 A u T O ( n )
( E T 1 A + C ) u T O ( n )
Table 3. Operational complexity analysis table of p 2 .
Table 3. Operational complexity analysis table of p 2 .
Operating StepsComplexity
A u T O ( n )
B p 1 T O ( n )
A u T + B p 1 T O ( n )
T 1 ( A u T + B p 1 T ) O ( n )
Table 4. Encoder resource use report. * represent The name annotation of logical devices in Xilinx Zynq7020 FPGA hardware resources can be deleted.
Table 4. Encoder resource use report. * represent The name annotation of logical devices in Xilinx Zynq7020 FPGA hardware resources can be deleted.
Site TypeUsedFixedAvailableUtil%
Slice LUTs *10,752053,20020.00
 LUT as Logic10,752053,20020.00
 LUT as Memory0017,4000.00
Slice Registers12,6580106,40012.00
 Register as Flip Flop12,6580106,40012.00
 Register as Latch00106,4000.00
F7 Muxes1904026,6007.00
F8 Muxes888013,3007.00
Table 5. Coder and decoder resource usage report. * represent The name annotation of logical devices in Xilinx Zynq7020 FPGA hardware resources can be deleted.
Table 5. Coder and decoder resource usage report. * represent The name annotation of logical devices in Xilinx Zynq7020 FPGA hardware resources can be deleted.
Site TypeUsedFixedAvailableUtil%
Slice LUTs *36,689053,20069.00
 LUT as Logic36,022053,20068.00
 LUT as Memory667017,4004.00
  LUT as Distributed RAM1710
  LUT as Shift Register4960
Slice Registers28,5360106,40027.00
 Register as Flip Flop28,5360106,40027.00
 Register as Latch00106,4000.00
F7 Muxes2480026,6009.00
F8 Muxes920013,3007.00
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liao, X.; Guo, J.; Luo, Z.; Xu, Y.; Chu, Y. Research and Implementation of High-Efficiency and Low-Complexity LDPC Coding Algorithm. Electronics 2023, 12, 3696. https://doi.org/10.3390/electronics12173696

AMA Style

Liao X, Guo J, Luo Z, Xu Y, Chu Y. Research and Implementation of High-Efficiency and Low-Complexity LDPC Coding Algorithm. Electronics. 2023; 12(17):3696. https://doi.org/10.3390/electronics12173696

Chicago/Turabian Style

Liao, Xiong, Junxiong Guo, Zhenghua Luo, Yanghui Xu, and Yingjun Chu. 2023. "Research and Implementation of High-Efficiency and Low-Complexity LDPC Coding Algorithm" Electronics 12, no. 17: 3696. https://doi.org/10.3390/electronics12173696

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop