A Parallelized Phase-Frequency Detector based Modified LSPF-DPLL for Wireless Communication – Comparative Study with Modified LSPF-DPLL

In recent times, communication technologies have evolved at a brisk rate. 3G and 4G networks have already been widely deployed. However, with increase in user number and applications, alternative strategies in the form of 5G systems have been devised. Thus, it is desirable to have communication systems which can provide low error levels with minimum delay. DPLL based systems play a versatile role in modern day communication receivers. Popular DPLL receiver packages based on Zero Crossing (ZC) and Least Square Polynomial Fitting (LSPF) techniques have been recently proposed to serve standalone reception in communcation setups. Such systems involve intensive computations and thus show excellant error performance ; but are bottlenecked by poor time performance. The LSPF-DPLL system was aided by a Modified Phase Resolving Numerically Controlled Oscilator (MPR-NCO) to achieve both improved error and time performance in another subsequent DPLL design that was proposed of late. This design is modified in the DPLL system proposed here by incorporating a Parallelized LSPF based Phase-Frequency Detector to achieve furhter improvements in time performance while maintaining the system’s error performance.


Introduction
Digital Phase Locked Loops (DPLLs) are the sampled domain extraction of the Phase Locked Loops (PLLs) realized by means of a digital Phase-Frequency Detector (PFD) [1,2]. Popularly referred to as an FM demodulator, DPLL plays diverse roles in communication systems which include recovering carrier information in noncoherent systems where the channel state is not known to the receiver [2], symbol retrieval in coherent links for known channel [3], multi-clock synchronization and clock distribution into different sub-systems using divider unit in PLLs besides time synchronizing such systems [2]. DPLL based packages have therefore been emphasized as standalone means of reception in recent wireless communication setups [3].
A ZC-DPLL has been proposed in [4] for high Doppler environments which incorporates a hyperbolic nonlinearity block and a sigma-delta modulator unit which helps the loop filter to adapt better. The proposed system exhibits improved time jitter performance and widened lock-in range. The ZC-DPLL proposed for Rayleigh faded channels in [3] utilizes the conclusions of ZC Algorithm in its PFD and retrieves the original phase of transmitted QPSK symbols. The system exhibits excellent error rates but the intensive Left/Right (L/R) Shift algorithm leads to poor time performance. Similarly, a highly intense L/R Shift algorithm based Numerical Controlled Oscillator (NCO) and LSPF based PFD aided DPLL based signal recovery system is presented in [2]. This system also shows excellent error performance but lacks time efficiency. An MPR-NCO based LSPF-DPLL system is proposed in [1] for wireless fading environments and emerges as an improved design over the existing LSPF-DPLL [2] with better error and time performance.
The recent trend in wireless communications as can be seen in [1,4] is time efficient symbol recovery. Certain DPLL designs however emphasize on significant reduction in BER levels and therefore involve intensive computations which lead to poor time performance [2,3]. The LSPF-DPLL in [2] additionally also involves another hugely time-intensive LSPF based PFD unit. An MPR-NCO based LSPF-DPLL has been recently proposed in [1] and exhibits improved error and time performance over the existing design in [2]. The time efficiency improvement in [1] is solely due to the incorporation of the MPR-NCO. The design presented here tries to further improve system time performance by parallelizing the operations of intensive LSPF based PFD in [1] using the different cores of a Quad Core processor while maintaining the system's error performance.
The following sections constitute the remaining portion of this paper. Section 2 discusses the proposed system model for the Parallelized Phase Frequency Detector based Modified LSPF-DPLL system in terms of its crucial components and their functioning, different mathematical relationships involved, parameters settings for simulation and the modifications made to parallelize the DPLL in [1] are also discussed in details. Section 3 analyzes the performance of the proposed system and a comparative study of the system with the MPR-NCO based LSPF-DPLL system in [1] is also presented here. Section 4 deals with the advantages and limitations of the system proposed and finally Section 5 concludes the discussion.

System Model
The proposed DPLL system with parallelized PFD is subjected to degraded QPSK modulated symbols under the combined effects of Rayleigh and Rician faded wireless environments and Additive White Gaussian Noise (AWGN).

Figure 1. Parallelized PFD based Modified LSPF-DPLL
The architecture overview for the proposed DPLL system is depicted in Figure 1. It consists of a framer for symbol-wise operation, a Parallelized version of the LSPF based Phase-Frequency Detector (LSPF-PFD) in [1], a Root Approximator (RA) based Loop Filter (RA-LF) and an MPR-NCO which are integrated for proper system functioning. QPSK dibit-phase and phase-dibit mapper units aid the proposed DPLL topology. The Framer unit splits the received signal into frames of length same as QPSK symbol period. The time-intensive LSPF-PFD proposed in [1,2] is modified here by parallelizing the PFD action in [1] using the different cores of a Quad core processor as shown in Figure 2. The time efficient parallelized PFD provides the best fit coefficients a 0 to a 6 containing ZC information of the incoming signal. The RA-LF utilizes these coeeficients to measure the frequency and phase counts associated with the incoming degraded symbols. It does so by extracting the roots and Eigen values of the best-fit estimate. The MPR-NCO uses the frequency and phase counts from RA-LF and fitted signal from the LSPF-PFD and provides three crucial outputs. They are the zero-phase frequency matched signal which is fed back to the LSPF-PFD, the phase-frequency matched signal with original modulation information and also the accurate phase value associated with the received symbol computed from the phase count of RA-LF. The NCO determines the accurate phase value using the lookup information of dibit-phase mapper. The accurate phase is further mapped to the correct dibits using the phasedibit mapper to reduce the need for complicated branched QPSK demodulation as suggested in [2,3,5].

Parallel PFD based LSPF-DPLL Analysis
The modified LSPF-DPLL with Parallelized PFD proposed for QPSK modulated symbols recovers digital intelligence under degraded effects of Rayleigh and Rician fading channels and AWGN noise.

QPSK signal model
Quadri-phase signals significantly conserve bandwidth by transmitting dibits and also maintain low error rates using distinct decision boundaries as elaborated in [2,6,7]. The proposed system is thus inspected for QPSK modulated symbols. A QPSK modulated signal is mathematically represented as in eq. (1)

Rayleigh and Rician Fading Model
Rayleigh and Rician fading models provide accurate estimates of the wireless fading environment prevelant in urban, sub-urban and rural areas [5,7]. The proposed Parallel PFD based modified DPLL system is tested under degraded effects of Rayleigh and Rician fading.
The received signal s(t) under Rayleigh faded NLOS transmission scenario between transmitter and receiver is depicted in eq. (2): where a k is the gain in the k th path, Ȧ c is the angular carrier frequency for transmission, Ȧ dk is the Doppler Frequency due to relative motion in between the transmitter and receiver for k th path and ‫‬ k is the random phase for the k th path [2,5].
Similarly, the received signal under Rician faded wireless environment composite of NLOS and LOS paths is depicted as shown in eq. (3) : where s(t) represents the contribution due to the NLOS components, k d indicates the strength of the LOS component, and Ȧ d is the Doppler frequency component in the direct path[1, 2, 5].

Component wise analysis of Proposed Model
The proposed Parallel PFD based Modified LSPF-DPLL depicted in Figure 1 incorporates five functional blocks as already mentioned in Section 2.1. The Framer Unit provides symbol-wise DPLL operation. The detailed functioning of the four major units are elaborated below.

Parallelized LSPF based PFD
The LSPF based PFD performs phase-frequency detection at moderate sampling rates over ZC based designs as suggested in [1,2]. The presented 6 th order polynomial fitting PFD [2] however involves intensive computations and thus leads to poor time performance of the PFD. The design presented in [1] incorporates an MPR-NCO which leads to improvements in processing time. The phase-frequency detection process in [1] was realized using the intensive LSPF unit and so in addition to the time efficiency achieved due to the MPR-NCO, there remained scope for further improvement in time performance with necessary modifications made in the PFD unit in [1]. To achieve this improvement in system performance, the intensive computations involved in the serial LSPF-PFD unit of [1] are parallelized using the different cores of a Quad Core Processor. The parallelized version of the LSPF-PFD provides the bestfit coefficients a 0 to a 6 which are indicative measures of phase and frequency content associated with the incoming symbol.
If (t 1 ,y 1 ), (t 2 ,y 2 )……..(t n ,y n ) represent the corrupted signal samples of received QPSK sybmols along with time index, the received symbol can therefore be fitted to a 6 th order polynomial, such that the sum of squared residuals S is minimized as discussed in [2]. The fitted signal is mathematically expressed as in eq. (4): On partially differentiating S in eq. (5) with respect to best-fit coefficients a 0 to a 6 , and equating each of them to zero, an important matrix equality is obtained in eq. (6) from which the coefficients a 0 to a 6 can be obtained.

ns
Eq. (6) provides the optimum LSPF coefficients a 0 to a 6 , which contains phase and frequency contents of the received QPSK symbol. Moreover, it computes the LSPF best fit curve which was additionally incorporated in [1] to obtain information about whether the signal phase EHLQJ WUDFNHG LV EHORZ RU EH\RQG ʌ UDGLDQV As can be seen, a number of intensive computations are involved in eq. (6) to compute the best fit LSPF coefficients. Sequential execution of such intense computations using a single core of the processor leads to hugely degraded time performance. So, these intense computations of the PFD are distributed for parallel execution on multiple processor cores as depicted in Figure 2. A Quad core processor is used to achieve speed up in the PFD and thus overall time performance improvement over [1]. The proposed system incorporates batch mode of processing to initiate a batch consisting of n cores. Each PFD computation is termed as a job. The batch uses one core to execute its own workload distribution and result accumulation process whereas the remaining n-1 cores are used to parallely execute n-1 jobs at a time as shown in Figure 2. Upon completion of all the necessary jobs (computations), the results of all the jobs are combined together to provide the PFD output, i.e., the LSPF coefficients a 0 to a 6 (fed to RA-LF) and the fitted signal ŷ (fed to MPR-NCO). It is important to note that the computations in eq. (6) are to be performed per symbol of the received QPSK signal. The parallelization and subsequent speed up are therefore achieved for each received symbol. This speed up achieved per symbol of received data, will definitely lead to an overall time performance improvement for a standard transmission with multiple QPSK symbols [2]. The computation time is directly related to the number of signal samples ns used to represent each incoming symbol and thus the variation of processing time with ns has been analyzed.

RA based LF
The Loop Filter Unit for the proposed DPLL is realized using a Root Approximation unit as proposed in [1,2]. It utilizes a matrix M RA formed by the coefficients a 0 to a 6 from the PFD unit to compute the roots signifying signal zero crossing index as shown in eq. (7) below.
From the matrix in eq. (7), six eigen values e 1 to e 6 are computed and further using eq. (8) six roots r 1 to r 6 are evaluated as below.
i i e r 1 (8) As elaborated in [1,2], the frequency and phase counts, f cnt and p cnt , are computed from the significant roots r 2 to r 5 using eq. (9) and (10). The MPR-NCO uses the computed values of f cnt and p cnt for further processing as shown in Figure 1.

MPR-NCO
The Parallelized DPLL system incorporates the Modified NCO proposed in [1] to compute accurate phase values associated with incoming QPSK symbols by avoiding sample-wise reorientation [8,9,10] resulting in improved time as well as error performance [1]. The MPR-NCO uses f cnt and p cnt from RA-LF, ŷ from Parallelized PFD unit and possible phase values from look up table of QPSK dibit-phase mapper unit. It performs the following steps for proper phase and frequency matching as suggested in [1].
x It uses f cnt to regenerate the carrier zero-phase carrier which acts as free running NCO signal.
x It studies the first zero crossing of ŷ to set a flag f as either '0' or '1' as elaborated in [1].
x The estimated phase est p is thus computed using eq. (11) in a similar manner to [1].
) ( S u f p est cnt p (11) x The accurate phase acc p is finally computed as the phase value from the look-up table having minimum difference with est p . Using acc p and f cnt , the modified NCO generates the phase frequency matched signal. acc p is fed to dibitphase mapper unit for simpler bit recovery [1].

QPSK dibit-phase and phase-dibit mapper
A similar dibit-phase and phase-dibit mapper unit as proposed in [1] is incorporated here for simpler bit recovery by avoiding branched demodulation structure suggested in [2,11,12]. The dibit-phase mapper aids NCO action by providing different possible phases whereas the phase-dibit mapper uses the look-up table of dibits and phases shown in Table 1 to map acc p to the correct dibits facilitating simpler recovery of digital data.

Simulation Parameter Settings
The Modified LSPF-DPLL system proposed with Parallelized PFD is tested under non-coherent transmission scenario. Table 2 shows the different parameters and corresponding specifications under which the proposed system has been simulated. As depicted in Table 2, the number of cores used for parallel execution of the load is varied between three and four. The mode of processing is batch type, so one core is always used to execute the batch process. Therefore, when three cores are used, effectively the PFD load is distributed between two cores. Similarly, the load is divided between three cores when the total number of cores used is four.

Results and Discussion
The analysis presented in this section can be broadly classified into two parts. Firstly, the system performance improvement due to parallelization of PFD unit is studied in terms of CPU utilization, processing time requirement and percentage speed up achieved with respect to the serial DPLL [1]. Secondly, the error performance of the system is analyzed in terms of a comparative BER analysis under different fading conditions to ensure that BER levels of Serial DPLL [1] are preserved.      Table 3 show average improvement in CPU utilization for Parallel DPLL over Serial DPLL by 53 % and speed up as high as 72 % with 4 cores.

Advantages and Limitation
The proposed system has certain advantages and also some limitations which are outlined below.

Advantages
In addition to possessing the advantages suggested in [1], the system also exhibits the following features: x Reduced computational load per core due to distribution of PFD operations into different cores of the Quad core processor.
x Further improvement in processing speed as compared to the Serial DPLLs presented in [1] [2].
x Improved CPU resource utilization over systems proposed in [1,2]; serves as a dedicated receiver package for DPLL based communications.

Limitation
In some specific cases where either the sample size per QPSK symbol or the total number of such symbols transmitted are comparatively smaller than the data sizes for a standard transmission, the parallelization overheads may consume more time leading to ineffective parallelization and thus degraded time performance.

Conclusion and Future Direction
This paper proposes a Parallelized PFD based LSPF-DPLL system. The previously proposed Serial DPLL with MPR-NCO [1] is modified here by incorporating a new parallel processing aided LSPF-PFD unit to time efficiently track accurate phase values for retrieving digital intelligence. The system is tested under varying conditions of Rayleigh and Rician fading. Specifically, the performance of PFD unit is rigorously tested under different number of cores and with varying sample sizes. The proposed DPLL system is found to exhibit improvements in time performance, percentage speed up and CPU utilization over the Serial DPLL in [1] making it a suitable for today's technology focussed on minimizing delays. In addition to this, the system also maintains similar BER levels to those suggested in [1]. The proposed work could be extended further by analyzing the system under realistic estimates of Nakagami-m fading channels composite of Rayleigh and Rician behaviour.