A Parallel Timing Synchronization Structure in Real-Time High Transmission Capacity Wireless Communication Systems

Hao, Xin; Lin, Changxing; Wu, Qiuyu

doi:10.3390/electronics9040652

Open AccessArticle

A Parallel Timing Synchronization Structure in Real-Time High Transmission Capacity Wireless Communication Systems

by

Xin Hao

^1,2,*

,

Changxing Lin

^1,2 and

Qiuyu Wu

^1,2,*

¹

Microsystem & Terahertz Research Center, China Academy of Engineering Physics, Chengdu 610200, China

²

Institute of Electronic Engineering, China Academy of Engineering Physics, Mianyang 621900, China

^*

Authors to whom correspondence should be addressed.

Electronics 2020, 9(4), 652; https://doi.org/10.3390/electronics9040652

Submission received: 15 February 2020 / Revised: 9 April 2020 / Accepted: 10 April 2020 / Published: 16 April 2020

(This article belongs to the Special Issue Emerging Applications of Recent FPGA Architectures)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In the past few years, parallel digital signal processing (PDSP) architectures have been intensively studied to fulfill the growing demand of channel capacity in coherent optical communication systems. However, to our knowledge, real-time timing synchronization in such architectures is until now not implemented on a Field Programmable Gate Array (FPGA). In this article, a parallel timing synchronization architecture is proposed. In the architecture, a parallel First In First Out (FIFO) structure based on an index associated rearranging method, and a dual feedback loop based on the Gardner’s algorithm, are adopted. Taking advantages of the FIFO structure, 67% Look Up Table (LUT) is saved in comparison with earlier results, meanwhile the Numerically Controlled Oscillator (NCO) is efficiently improved to meet the FPGA timing requirements for real-time performance. MATLAB simulations are run to evaluate the Bit Error Rate (BER) deterioration of the architecture. The float- and fixed-point simulation results have shown that, The BER deteriorations are less than 0.5 dB and 1 dB, respectively. Further, the implementation of the architecture on a Xilinx XC7VX485T FPGA chip is achieved. A 20 giga bit per second (Gbps) 16 Quadrature Amplitude Modulation (16QAM) real-time system is achieved at the system clock of 159.524 MHz. This work opens a new pathway to improve the transmission capacity in real-time wireless communication systems.

Keywords:

parallel digital signal processing (PDSP); wireless communication; real-time; FPGA; timing synchronization

1. Introduction

In modern society, the demand of transmission capacity in wireless communication systems is continuously increasing. Forecasted by Cisco Visual Networking Index (VNI) published in 2017, global mobile data traffic will rise to 49 exabytes per month by 2021, which only reached 7.2 exabytes per month at the end of 2016 [1]. To fulfill this trend, high-speed parallel digital signal processing (PDSP) technologies [2] in wireless communication systems are becoming overwhelmingly prevalent in recent years.

In 2013, Microelectronics and Nanotechnology (IEMN) demonstrated an offline photoelectronic system [3] with an 8.2 giga bit per second (Gbps) communication rate. In 2012, Fraunhofer Institute for Applied Solid State Physics (IFA) fulfilled an offline electronic system [4] with a 24 Gbps rate, and a high offline bit rate has been achieved at about 50 Gbps from the Nippon Telegraph and Telephone Public Corporation (NTT) of Japan in 2014 [5]. Although with high complexity, the potential of PDSP architectures attracts many research interests, off-line. In optical fiber communication, recent literatures have reported systems based on PDSP architectures [6,7]. However, to our best knowledge, until now, only offline (non-real time) PDSP architectures [8] have been realized.

Offline systems can be applied in many areas, such as High Definition (HD) movies and other applications. However, in many applications, real-time is required. For instance, an HD video call system has a large amount of data to be processed. If the baseband digital signal processing is running offline, the users will have to wait for a rather long time of processing before the user on the receiver side can get the information transmitted from the user in the transmitted side. To solve this problem, online (real-time) communication system is becoming an overwhelming prevalent technology nowadays.

To enjoy the real-time features of communication systems, high order Quadrature Amplitude Modulation (QAM) [9,10] is thought to be the key to improve the spectrum efficiency. High order QAM modulation communication systems always require quite complicated demodulator architectures. Hardware complexity and resource limitation are thought to be the main challenges in implementing such a real-time tremendous architecture. Field Programmable Gate Array (FPGA)-based [2] PDSP architecture is supposed to be the key to solve this problem, novel architectures emerge endlessly. Baseband PDSPs [11,12,13] are necessary in these demodulator architectures, while parallel timing synchronization is essential in baseband PDSPs.

In this article, an improved two times oversampling parallel timing synchronization architecture aimed at real-time performance is proposed and then implemented on a Xilinx XC7VX485T FPGA chip. The key technology is PDSP on FPGA, which greatly reduces the system clock frequency and makes it feasible to achieve real-time performance with current existing hardware devices. Specifically, a parallel First In First Out (FIFO) structure based on an index associated rearranging method and a dual feedback loop based on the Gardner algorithm are adopted in our parallel architecture.

The rest part of this article is organized as follows. Section 2 describes several shortages of two existing parallel structures. The improved parallel structure and FPGA implementation is carried out in Section 3. Section 4 presents the simulation and implementation results. Finally, a conclusion is made in Section 5.

2. Shortages of Existing Parallel Architectures

As it has been discussed in Section 1, many offline communication systems with high transmission rate have been developed. Nonetheless, improving the communication rate of real-time systems is still a big problem that needs to be worked out, especially the baseband digital signal processing technology.

Most parallel structures are derived from serial algorithms. Gardner [14,15,16] and Oerder and Meyr (O&M) [17] algorithms are the two most commonly used serial timing synchronization algorithms. The authors have developed a parallel Gardner architecture [8,18,19] in a Gardner algorithm based two times oversampling coherent optical system. Nevertheless, this architecture could not achieve real-time performance, mainly because their Numerically Controlled Oscillator (NCO) structure could not meet FPGA timing requirements. Besides, the loop filter dismisses information provided by some of the parallel error detectors, which leads to a rather great deviation in recovered signals. To achieve real-time performance, Lin and his collaborators have successfully implemented parallel O&M on FPGA in [20]. However, its requirement of four times the oversampling costs an extremely large amount of hardware resources and needs higher speed Analog to Digital Converters (ADCs) in parallel systems.

2.1. Non-Real Time

Zhou and her collaborators have explained their parallel structure in [8]. However, their digital signal processing (DSP) structure was not implemented on any existing hardware facilities, even though their structure is offline verified on the MATLAB platform. The main reason that limits their real-time performance is that when timing error accumulates to a certain amount, information loss will occur in their structure.

It is also too difficult to get real-time performance on Digital Signal Processors (DSPs). Take one of the highest performance DSPs, Texas Instruments (TI) C66x series DSP, for example. The highest frequency C66x DSP could work at is only 1.2 GHz, which is impossible to implement a real-time communication system over a 20 Gbps rate.

2.2. High Hardware Resource Consumption

From [17], it is easy to find out that a structure based on the O&M algorithm will need at least four times the oversampling rate, because the O&M based parallel error detector in [20] needs four sampled points to get one timing error.

3. Improved Parallel Architecture

The architecture of the serial Gardner algorithm can be found in [14,15]. The parallel architecture improvements will be carried out in this section. A parallel First In First Out (FIFO) structure based on an index associated rearranging method and a dual feedback loop based on the Gardner’s algorithm are adopted in the proposed parallel architecture. The overall architecture is depicted in Figure 1.

To generate parallel digital signals, I Q analog signals are firstly sampled by two high-speed ADCs. The sample frequency relation of ADC and parallel structure is shown in Equation (1).

\begin{matrix} f_{a d c} = 2 m \cdot f_{s}, \end{matrix}

(1)

where,

f_{a d c}

is the sampling frequency of high speed ADC, m is the number of parallel channels, and

f_{s}

is the sampling frequency of parallel structure. The digital signals are stored by parallel FIFO and then rearranged to ensure the stability of data flow. Afterwards, interpolated signals from interpolators will be imported to (timing error detector) TEDs to obtain timing errors, and then filtered by loop filter. Ultimately, timing errors are compensated in interpolators by fractional interval and basepoint index provided by NCOs.

FPGA implementation block diagram with delays and bit width of the parallel architecture is depicted in Figure 2.

The stability of feedback loop is quite sensitive to timing error, so the error processing related modules require bit width more than 8 bits, even though the source and output are all 8-bit signals. The FPGA sources consumed and delay caused by each module will be discussed in detail in 3.1 and 3.2.

If not specified, the following descriptions are based on m channel parallel module, and the delay adder and multiplier brought into FPGA are 1 and 3 respectively.

3.1. Parallel Preprocess

Parallel preprocessing composed of parallel FIFO, data rearrange and data select module is responsible for the stability of data flow.

3.1.1. Parallel FIFO

The source signals need to be stored in the parallel FIFO before all the other procedures in case of any information loss.

2 m

FIFOs with 8 bits of write/read depth and 512 bits of write width of I Q signal are required on FPGA in an m parallel architecture.

3.1.2. Resource-Saving Data Rearrangement

Timing error is caused by timing frequency and phase offset. The phase offset caused error is a constant value, but timing error will continuously increase or decrease to infinite if there exists timing frequency offset. To restrict the error, source signals need to be deleted or kept when the error has accumulated to a certain amount. However, in a parallel structure, the parallel source signal sequence will be disordered once the delete or keep operation occurs.

Data rearrangement is adopted to delete/keep the source signal and then adjust the disordered signals into a correct order. In [10], the authors have taken a parallel FIFO based delete-keep method to make this adjustment. However, as shown in the first column of Table 1, the subscripts of indexes are variable. This leads to Look Up Table (LUT) being consumed in every parallel channel. More seriously, the LUT consumption increases exponentially with parallel number.

To solve this problem, an index-associated method is proposed. It is easy to find out that the increment of subscript shares the same value with the increment of index value from Figure 1. Thus, the variables can be translated into index(i) plus the corresponding subscripts, as exhibited in the second column of Table 1. With the proposed associated index method, only one LUT is consumed because the other LUTs in [10] can be replaced by add operations. Taking an m parallel system for instance, the LUT are carried out only in index(i), and the other

m - 1

LUTs are replaced with

m - 1

adders.

Our work aims at achieving a 20 Gbps rate communication system, a 64 parallel architecture will be a wise choice. Because the FPGA clock frequency will be running around 156.25 MHz, which will not be limited by the current hardware. So the comparison of synthesised FPGA resource utilization with 64-parallel FIFO is exhibited in Table 2. It shows that the proposed method saves about 67% LUTs compared to the method in [10].

3.1.3. Data Select

A data select module is used to send rearranged source signal to the corresponding interpolator, and each interpolator needs four data. Specifically, in a parallel structure, four extra source signals from the beginning of the next clock cycle need to be attached to the end of current rearranged queue. Otherwise, the last four interpolators could not have enough source data. A data select module needs

5 m

registers on FPGA in a

2 m

parallel system.

3.2. Parallel Dual Feedback Loop

The improved parallel dual feedback loop is composed of a Parallel Module (PM) and a loop filter. Each PM has two interpolators, one TED and one NCO.

3.2.1. Parallel Module

Every PM needs five source signals and each interpolator has four, the three source signals in the middle are used by both interpolators.

1: Coefficient Multiplier-Free Interpolator

As it has been summarized in [15], three multipliers/dividers will be consumed with cubic interpolator while updating the coefficients every time, and two while updating with four-point piecewise-parabolic interpolator (

α = 1 / 2

) in serial systems. The coefficients are shown in Equation (2).

\begin{matrix} C_{- 2} = 1 / 2 \cdot x^{2} - 1 / 2 \cdot x, \\ C_{- 1} = - 1 / 2 \cdot x^{2} + 3 / 2 \cdot x, \\ C_{0} = - 1 / 2 \cdot x^{2} - 1 / 2 \cdot x + 1, \\ C_{1} = 1 / 2 \cdot x^{2} - 1 / 2 \cdot x, \end{matrix}

(2)

where x stands for the input signals. Specifically, ‘

3 / 2 \cdot x

’ in Equation (2) is separated into ‘

1 / 2 \cdot x + x

’. Then all Farrow coefficients of four-point piecewise-parabolic interpolator with

α = 1 / 2

will be an integer multiple of 2. This makes the multiplier/divider could be implemented with shift operations on FPGA [21]. In other words, four-point piecewise-parabolic with

α = 1 / 2

is the best choice for FPGA that could balance the maximum resource savings with minimal performance deterioration.

The Farrow structure on FPGA is shown in Figure 3.

where, D stands for hardware delay, and the symbols are corresponding to those in Equation (2). In particular, one more extra delay is caused by the coefficient ‘3/2’ which is separated into ‘1 + 1/2’ described above. While each adder brings in 1 delay, and the multiplier brings in 3. So it is easy to find out that the number of delays caused by interpolator is 11. Where,

2: Error Detector

Gardner’s timing error detector (TED) [16] is shown in Equation (3).

\begin{matrix} e (n) & = I (n - 1 / 2) [I (n) - I (n - 1)] \\ + Q (n - 1 / 2) [Q (n) - Q (n - 1)], \end{matrix}

(3)

where

e (n)

is the timing error,

I (n)

and

Q (n)

are real and image parts of the interpolator’s output,

n - 1, n - 1 / 2, n

are three continuous indexes.

TED equation in a parallel system is shown in Equation (4) below

\begin{matrix} e (n, i) & = I_{1} (n, i) [I_{2} (n, i) - I_{2} (n, i - 1)] \\ + Q_{1} (n, i) [Q_{2} (n, i) - Q_{2} (n, i - 1)], \end{matrix}

(4)

where

e (n, i)

is the timing error of the ith PM at time n. For I signal,

I_{1} (n, i)

is the first interpolator’s output of the ith PM at time n,

I_{2} (n, i)

is the second interpolator’s output of the ith PM at time n,

I_{2} (n, i - 1)

is the second interpolator’s output of the

i - 1

th PM at time n. When i = 1,

I_{2} (n, i - 1)

stands for the second interpolator’s output of the last PM at time n − 1. Q signal has the same explanation.

For FPGA implementation, each error detector contains two adders and two multipliers. The hardware block diagram is depicted in Figure 4.

where D stands for hardware delay, and the symbols are corresponding to those in Equation (4). From Figure 4, it can be seen that the total delay is 5, in which the adders bring in 1 delay each and multipliers bring in 3.

3: Simplified NCO

In [8], the overflow moments are obtained by comparators. Nevertheless, as we have discussed in Section 2, their comparison logic has difficulties to meet FPGA timing requirements. To get real-time performance, a direct-calculation based parallel NCO structure is proposed. In [14], Gardner gives Equation (5) below to calculate

m_{k}

\begin{matrix} m_{k} = i n t [k T_{i} / T_{s}], \end{matrix}

(5)

where

i n t

[z] means the largest integer not exceeding z,

T_{s}

is the sample period before synchronization, and

T_{i}

is the synchronized sample period. Equation (5) can be translated into Equation (6) below

\begin{matrix} m_{k + 1} = m_{k} + f i x [R + W (n) + μ_{k}], \end{matrix}

(6)

where

f i x [z]

stands for the largest integer toward z, and R is half of the oversampling ratio,

W (n)

is the control word of NCO at time n. Thus,

m_{k}

of each parallel module can be calculated directly and accurately instead of the comparison logic.

When implemented on FPGA, the NCO only needs to locate the initial position of the interpolator with one 8-bit control signal in a serial structure. However, in a parallel structure, not only the initial positions of each parallel module are required, another 2-bit control signal is required for rearrangement. Luckily, in our proposed direct calculation method NCO, this 2-bit signal is the first two bits of the 8-bit control signal in the mth NCO, so there are no more hardware resources required, as depicted in Figure 1.

3.2.2. High Precision Loop Filter

A proportional integral filter is employed in our structure. In [8], the information carried by most of the TEDs, except for the last one, are dismissed and only the last TED’s timing error serves as the input of proportional element, which leads to a great deviation. To guarantee the accuracy, the average value of all timing errors is employed as the input of loop filter in our work. Simulation results confirmed that the performance is better when using the average value. The equations of the improved loop filter is shown from Equation (7) to Equation (9).

Proportional element

\begin{matrix} P_{n} = k_{1} \times (e r r_{n, 1} + e r r_{n, 2} + \dots + e r r_{n, 32}), \end{matrix}

(7)

Integral element

\begin{matrix} I_{n} = I_{n - 1} + k_{2} \times (e r r_{n, 1} + e r r_{n, 2} + \dots + e r r_{n, 32}), \end{matrix}

(8)

Then the output of loop filter is

\begin{matrix} W_{n} = P_{n} + I_{n}, \end{matrix}

(9)

where,

k_{1}

and

k_{2}

stand for the coefficient of proportional element and integral element separately,

e r r_{(n, i)}

is the error of the ith PM at time n,

W_{n}

is the output of loop filter at time n. The structure is depicted in Figure 5.

When implemented on FPGA, there will be an error smooth module before the first adder, because the input of loop filter is modified to the average value of parallel timing errors as mentioned above. In an m parallel system there exists

m / 2

TEDs, so only a

l o g_{2} m / 2

-bit shift operation on FPGA can accomplish the smoother. Besides, the first adder is an

m / 2

input adder, so in an m parallel system the delay brought in by the smoother is

l o g_{2} m / 2

. To save the hardware resources, the multipliers will be replaced by shift operations on FPGA as aforementioned. So

k_{1}

and

k_{2}

in the loop filter are approximated to the nearest integer power of 1/2 to replace the multipliers by shift operations on FPGA. As the multiplier operation on FPGA is achieved by shift and add operation, so even though these approximations would change the closed loop bandwidth and the damping factor achieved by the system, the approximation is reasonable.

Therefore, the total adders consumed in the loop filter is

1 + m / 2

, and the corresponding delay is

2 + l o g_{2} m / 2

.

4. Simulation and Fpga Implementation

Our work aims at achieving a 20 Gbps rate wireless communication system, which has been mentioned before. A 64-parallel architecture could make the FPGA clock frequency running at around 156.25 MHz. A 32-parallel system will lead to 312.5 MHz FPGA clock frequency, which makes it very difficult for FPGA to ensure the stability while running. Even though a 128-parallel system needs the FPGA circuit runs only at 78.125 MHz, which is quite easy for nowadays FPGA circuit to guarantee its stability, but with the increase of the number parallel channels, the system error grows drastically. On the other hand, the number of parallel channels other than an integer multiple of two will lead to a waste of waste hardware resource on FPGA chip while routing, because FPGA is based on binary system. So, a 64-parallel architecture is the best choice for a 20 Gbps rate system.

The proposed algorithm is verified in a baseband communication system. The modulation type is 16 QAM, bit rate is 20 Gbps, roll-off factor is 0.4, oversampling frequency

f_{s}

is 2 times the symbol rate

R_{s}, (f_{s} = 2 R_{s})

, and the timing frequency and phase offset are 32 kHz and

π

respectively. The parallel source signal is quantized to 8 bits. The BER performance of the improved parallel architecture simulated on MATLAB reveals its high efficiency. Furthermore, the implemented parallel architecture on Xilinx XC7VX485T FPGA shows perfect consistency with simulation.

4.1. MATLAB Simulation

The constellation diagrams are shown in Figure 6, where Figure 6a,b are the constellation diagrams before and after timing synchronization respectively. Here, SNR is set to 20 dB. The converged constellation diagram proves that the timing module works correctly.

BER performance for 100 frames (with 16,384 bits each) transmission is carried out in Figure 7. The blue curve is the theoretical BER, ‘*’ and ‘Δ’ represent MATLAB fix and float point simulation respectively. The BER performance indicates that the algorithm can work efficiently with deterioration less than 0.5 dB and 1 dB in float and fix point simulation.

4.2. FPGA Implementation

Implementation is demonstrated on a Xilinx XC7VX485T FPGA chip. 128 parallel ROMs are employed to store the source data as equivalent to the 10 GHz ADCs. In order to evaluate the performance difference between simulation and FPGA implementation, periodic source is embedded into the ROMs. The write and read clock of parallel FIFO module are set to 156.268 MHz and 159.524 MHz respectively by an Mixed Mode Clock Manager (MMCM). The device utilization of the whole system is summarized in Table 3.

The output of fractional interval is depicted in Figure 8a. Where, ‘*’ and ‘Δ’ represent MATLAB fix point simulation and hardware behavior simulation respectively. It is easy to find out that these two signals are totally overlapped, which indicates the high efficiency of hardware design. To further verify the accuracy, the difference value of the two aforementioned signals is exhibited in Figure 8b. The constant zero proves the NCO output in MATLAB is exactly the same as in hardware behavior simulation. Moreover, not only fractional interval, but all the signals achieved in behavior simulation is exactly the same as those in MATLAB, which confirmed the correctness and effectiveness.

The constellation diagram achieved by Xilinx XC7VX485T FPGA chip is displayed in Figure 9a. Figure 9b shows the difference value (image part) of the interpolator’s output of behavior simulation and FPGA implementation.

The difference here can not be guaranteed to always be zero because the initial source signals are impossible to be precisely controlled on an FPGA chip. However, the difference value is always less than 2 from about the 15,000th datum. This means only quantization error less than 1.5% will be brought in for an 8-bit signal and further confirms that the proposed algorithm can work effectively on FPGA.

5. Conclusions

Through this paper, we have proposed an improved parallel timing synchronization architecture to solve the urgent problem of enhancing transmission capacity in real-time wireless communication systems. Besides, we have demonstrated that the proposed architecture can be successfully implemented on FPGA. In addition, our work saves 67% LUTs resources on FPGA compared with eariler results. Meanwhile, the NCO is further improved to meet the FPGA timing requirements by direct-calculation instead of comparator based structure in related work. The key technology is parallel signal processing, both in theory and FPGA implementation. Parallelization of m channels could reduce the system clock to 1/2 m of that required in serial processing, which makes it feasible to achieve real-time performance with current existing hardware devices. Accordingly, a parallel digital timing synchronization theoretical model was established. Simulation result of 64-parallel channels, 20 Gbps rate and 16 QAM system shows high consistency with the theoretical model. The BER deterioration is less than 0.5 dB and 1 dB in float and fix point simulation respectively. Simultaneously, FPGA implementation shows excellent agreement with simulation. Furthermore, the proposed algorithm is not limited to 64-parallel, higher capacity can be achieved with faster clocks or more parallel channels. The proposed structure would be potentially optimized in future work of high capacity wireless communication.

Author Contributions

X.H.: Methodology, Software, Validation, Formal Analysis, Data Curation, Writing— Original Draft. C.L.: Conceptualization, Investigation, Writing—Review & Editing, Supervision, Project Administration, Funding Acquisition. Q.W.: Resources, Writing—Review & Editing, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key R&D Program of China Grant (2018YFB18000500), The President Funding of China Academy of Engineering Physics with No. YZJJLX2018009.

Acknowledgments

The authors would like to thank Zhu Zheng, Zheng Feng, Lei Zhao, Yang Yu, Jianfei An and Chengda Ren for helpful discussions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cisco. Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2017–2022; White Paper Cisco Public: San Jose, CA, USA, 2019. [Google Scholar]
Parhi, K.K. VLSI Digital Signal Processing: Systems Design and Implementation; A Wiley-Interscience Publication: Hoboken, NJ, USA, 1999; pp. 255–311. [Google Scholar]
Blin, S.; Tohme, L.; Coquillat, D.; Horiguchi, S.; Minamikata, Y.; Hisatake, S.; Nouvel, P.; Cohen, T.; Pénarier, A.; Cano, F.; et al. Wireless Communication at 310 GHz using GaAs high-Electron-Mobility Transistors for Detection. J. Commun. Netw. 2013, 15, 559–568. [Google Scholar] [CrossRef] [Green Version]
Antes, J.; König, S.; Leuther, A.; Massler, H.; Leuthold, J.; Ambacher, O.; Kallfass, I. 220 GHz wireless data transmission experiments up to 30 Gbit/s. In Proceedings of the 2012 IEEE/MTT-S International Microwave Symposium Digest, Montreal, QC, Canada, 17–22 June 2012; pp. 1–3. [Google Scholar]
Song, H.; Kim, J.; Ajito, K.; Kukutsu, N.; Yaita, M. 50-Gbps Direct Conversion QPSK Modulator and Demodulator MMICs for Terahertz Communications at 300 GHz. IEEE Trans. Microw. Theory Tech. 2014, 62, 600–609. [Google Scholar] [CrossRef]
Yang, T.; Shi, C.; Chen, X.; Zhang, M.; Ji, Y.; Hua, F.; Chen, Y. Linewidth-tolerant and multi-format carrier phase estimation schemes for coherent optical m-QAM flexible transmission systems. Opt. Express 2018, 26, 10599–10615. [Google Scholar] [CrossRef] [PubMed]
Gao, Z.; Zhou, M.; Reviriego, P.; Maestro, J.A. Efficient Fault-Tolerant Design for Parallel Matched Filters. IEEE Trans. Circuits Syst. II Express Briefs 2018, 65, 366–370. [Google Scholar] [CrossRef]
Zhou, X.; Chen, X. Parallel implementation of all-digital timing recovery for high-speed and real-time optical coherent receivers. Opt. Express 2011, 19, 9282–9295. [Google Scholar] [CrossRef] [PubMed]
Wu, Q.; Lin, C.; Lu, B.; Miao, L.; Hao, X.; Wang, Z.; Jiang, Y.; Lei, W.; Den, X.; Chen, H.; et al. A 21 km 5 Gbps real time wireless communication system at 0.14 THz. In Proceedings of the 2017 42nd International Conference on Infrared, Millimeter, and Terahertz Waves (IRMMW-THz), Cancun, Mexico, 27 August–1 September 2017; pp. 1–2. [Google Scholar]
Lin, C.; Zhang, J.; Shao, B. A Multi-Gigabit Parallel Demodulator and Its FPGA Implementation. IEICE TRANSACTIONS Fundam. Electron. Commun. Comput. Sci. 2012, 95, 1412–1415. [Google Scholar] [CrossRef]
Lin, C.; Shao, B.; Zhang, J. A high data rate parallel demodulator suited to FPGA implementation. In Proceedings of the 2010 International Symposium on Intelligent Signal Processing and Communication Systems, Chengdu, China, 6–8 December 2010; pp. 1–4. [Google Scholar]
Cheng, C.; Parhi, K.K. Hardware Efficient Fast Parallel FIR Filter Structures Based on Interated Short Convolution. IEEE Trans. Circuits Syst. I 2004, 51, 1492–1500. [Google Scholar] [CrossRef]
Cheng, C.; Parhi, K.K. Low Cost Parallel Adaptive Filter Structures. In Proceedings of the Conference Record of the 39th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 30 October–2 November 2005; pp. 354–358. [Google Scholar]
Gardner, F.M. Interpolation in digital modems. I. Fundamentals. IEEE Trans. Commun. 1993, 41, 501–507. [Google Scholar] [CrossRef]
Erup, L.; Gardner, F.M.; Harris, R.A. Interpolation in digital modems. II. Implementation and performance. IEEE Trans. Commun. 1993, 41, 998–1008. [Google Scholar] [CrossRef]
Gardner, F. A BPSK/QPSK Timing-Error Detector for Sampled Receivers. IEEE Trans. Commun. 1986, 34, 423–429. [Google Scholar] [CrossRef]
Oerder, M.; Meyr, H. Digital filter and square timing recovery. IEEE Trans. Commun. 1988, 36, 605–612. [Google Scholar] [CrossRef] [Green Version]
Zhou, X.; Chen, X.; Zhou, W.; Fan, Y.; Zhu, H.; Li, Z. All-Digital Timing Recovery and Adaptive Equalization for 112Gbit/s POLMUX-NRZ-DQPSK Optical Coherent Receivers. IEEE/OSA J. Opt. Commun. Netw. 2010, 2, 984–990. [Google Scholar] [CrossRef]
Fan, Y.; Chen, X.; Zhou, W.; Zhou, X. Parallel processing clock synchronization-dispersion equalization combining loop in 112Gb/s optical coherent receivers. In Proceedings of the 19th Annual Wireless and Optical Communications Conference (WOCC 2010), Shanghai, China, 14–15 May 2010; pp. 1–4. [Google Scholar]
Lin, C.; Zhang, J.; Shao, B. A High Speed Parallel Timing Recovery Algorithm and Its FPGA Implementation. In Proceedings of the 2011 2nd International Symposium on Intelligence Information Processing and Trusted Computing, Hubei, China, 22–23 October 2011; pp. 63–66. [Google Scholar]
Yan, X.; Wang, Q.; Hao, X.; Qin, K. A High-Efficiency Multiplierless Symbol Synchronization Algorithm for IEEE802.11x WLANs. Wirel. Pers. Commun. 2017, 94, 1737–1749. [Google Scholar] [CrossRef]

Figure 1. 2m-Parallel Architecture of Proposed Algorithm.

Figure 2. Field Programmable Gate Array (FPGA) block diagram of improved parallel timing architecture.

Figure 3. Farrow structure for piecewise-parabolic interpolator (

α = 1 / 2

).

Figure 3. Farrow structure for piecewise-parabolic interpolator (

α = 1 / 2

).

Figure 4. Implementation structure of the error detector.

Figure 5. Implementation structure of loop Filter with error smoothing.

Figure 6. Constellation diagrams before and after timing synchronization.

Figure 7. Bit Error Rate (BER) performance of the improved architecture.

Figure 8. Output and difference value of fractional interval with MATLAB fixed point simulation and behavior simulation.

Figure 9. Constellation diagram on FPGA and difference value (image part) of interpolator’s output with MATLAB fix point simulation and FPGA implementation.

Table 1. Index Relation.

In [10]	Associated Index
index(i)	index(i)
index(i+1)	index(i)+1
⋯	⋯
index(i+m-1)	index(i)+m-1

Table 2. Resource utilization comparison.

Parameter		LUT	FF
Utilization	In [10]	21,248 (7.00%)	3175 (0.52%)
	Proposed	6881 (2.27%)	2531 (0.42%)
Available		303,600	607,200

Table 3. Resource utilization comparison.

Resource	Utilization	Available	Utilization%
LUT	35,228	303,600	11.60
LUTRAM	1375	130,800	1.05
FF	55,950	607,200	9.21
BRAM	143	1030	13.88
DSP48e	480	2800	17.14

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hao, X.; Lin, C.; Wu, Q. A Parallel Timing Synchronization Structure in Real-Time High Transmission Capacity Wireless Communication Systems. Electronics 2020, 9, 652. https://doi.org/10.3390/electronics9040652

AMA Style

Hao X, Lin C, Wu Q. A Parallel Timing Synchronization Structure in Real-Time High Transmission Capacity Wireless Communication Systems. Electronics. 2020; 9(4):652. https://doi.org/10.3390/electronics9040652

Chicago/Turabian Style

Hao, Xin, Changxing Lin, and Qiuyu Wu. 2020. "A Parallel Timing Synchronization Structure in Real-Time High Transmission Capacity Wireless Communication Systems" Electronics 9, no. 4: 652. https://doi.org/10.3390/electronics9040652

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Parallel Timing Synchronization Structure in Real-Time High Transmission Capacity Wireless Communication Systems

Abstract

1. Introduction

2. Shortages of Existing Parallel Architectures

2.1. Non-Real Time

2.2. High Hardware Resource Consumption

3. Improved Parallel Architecture

3.1. Parallel Preprocess

3.1.1. Parallel FIFO

3.1.2. Resource-Saving Data Rearrangement

3.1.3. Data Select

3.2. Parallel Dual Feedback Loop

3.2.1. Parallel Module

3.2.2. High Precision Loop Filter

4. Simulation and Fpga Implementation

4.1. MATLAB Simulation

4.2. FPGA Implementation

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI