DLL-based 4-phase duty-cycle and phase correction circuit for high frequency clock tree

. In high-speed data transmission applications such as double data rate memory and double sampling ADCs, clock generation and distribution circuits must provide the clocks with precise duty cycle of 50% and sufficient timing margin. The proposed DLL-based 4-phase duty-cycle and phase correction circuit, consisting of delay-locked loop (DLL) and 45 phase clock generator (SR latch) corrects distorted duty-cycle clock to 50% duty-cycle. The distorted duty-cycle input clock passes through the DLL. After the DLL is locked, the delay of delay line is identical to the period of input clock. Lastly, 4-phase, 50% duty-cycle clocks is generated from the combination of rising edges of signals at each 1/4 points of delay line. The proposed circuit is implemented in 65nm CMOS. The simulation results shows that the frequency range of the proposed circuits is 550-1600MHz, the maximum duty cycle error of the output clock can be less than 1% with the input duty cycle correction ranging from 25% to 80%. The phase difference with the 4-phase output clock is 250 (cid:151) 3ps at a frequency of 1GHz. The measured power dissipation is 4.3mW.


Introduction
Recently, game consoles or high-performance TVs are need to high frequency bandwidth DRAM for improving the processing speed. However, an increasing in power consumption and a reducing of timing margin according to improvement of frequency have become a problem. One of the solutions is using multi-phase signals. But there are some unwanted factors such as jitter, skew, and PVT variations that make distortion of duty-cycle and/or a phase-skew between multi-phase clocks. For this reason, the duty-cycle correction (DCC) and phase correction (PC) are essential blocks for the circuits using multiphase clock signals.
Many studies about DCCs have been progressed and classified into analog type and digital type. Analog DCCs usually employ a pulse-width control loop (PWCL) [1]- [3]. Because of its analog and feedback loop, this type of DCCs achieves higher correcting accuracy, but suffers from relatively slow settling behaviour, loop instability, and charge pump mismatch.
All-digital duty cycle corrector (ADDCC) have two major types of architecture: synchronous-mirror-delay (SMD) and time-to-digital converter (TDC). And some applications use DCC with delay-locked loop (DLL) to eliminate the phase error.
The SMD-based ADDCCs [4]- [5] use a half-cycle delay line (HCDL) composed of a matched delay line and SR latch to generate a 50% duty-cycle clock. However, the pulse width generated by two delay-line is easily affected by process corners making mismatch problem.
The TDC-based DCCs [6]- [8] quantize a period of input clock into a digital code, and then generate halfcycle delay time using this digital code. The TDC-based DCCs can reduce the lock-in time, however duty-cycle error of the output is dependent on TDC resolution. Furthermore, it is difficult to design a high-resolution and a wide range TDC with low power.
Although the DCCs can make a 50% duty-cycle output clock, phases of the input and output clock are misaligned. To solve this problem, the DCC with DLL is proposed [9]. However, the DCC with DLL has a design challenge and circuit complexity of the DLL has to be minimized to reduce the power consumption and area.
Previously introduced circuits are single phase DCC architectures and some circuits are difficult to implement for multi-phase clock scheme. In this paper, a DLL-based 4-phase DCC and PC circuit for multi-phase interface is proposed. This scheme uses DLL to detecting the input clock period. The DLL guarantees precise detecting operation because of feedback loop. And then 50% dutycycle signals with 90° phase difference are made from the combinations of rising edge of four signals from delay line. Due to the using of analog type DLL, proposed design can achieve relatively precise duty-cycle correction and phase correction. Furthermore, errors caused by a mismatch are eliminated by using current mismatch reduced charge pump and clock generation structure.
The rest of this paper is organized as follows. The architecture of proposed 4-phase DCC and PC and detailed circuit description are presented in Section 2.

Circuit operation
The input clock passes through the VCDL and the input clock is compared with the delayed clock to align the positive edges of the two signal at PFD. The PFD of the DLL detects the phase and frequency error between the input clock and the delayed clock, and then the PFD outputs "UP/DOWN"-signal control the CP to convert a phase error into a control voltage. The control voltage controls the delay of VCDL to eliminate the phase error between the input clock and the delayed clock. When the phase error is removed, the DLL is locked.
After the DLL is locked, the delay between the input clock and the delayed clock are equal to the period of the input clock. And the output signals at every 1/4 point of VCDL apply to the input signal of the clock generator. The rising edges of the input signals (S1 -S4) trigger pulse generator that makes a pulse. The SDC makes the pulse into differential short pulses. The differential signal passes through each types of SR latch. To get 50% duty cycle clock, the SR latches sense the rising edge of the input signals (S1/S3 or S2/S4) which rising edges have 180 phase difference. And its output signals enter into the latched comparator. Finally, the output signals of latched comparator are 4-phase clock signal with 50% duty cycle.

Circuit description
In this section, detailed DCC and PC circuits are introduced. The DLL block is examined first, followed by an in-depth description of the clock generator.

Voltage controlled delay line (VCDL)
The VCDL which is consist of current starved delay cell is illustrated in Figure 2. Both the N and the P side of mosfets control the rising and falling time of inverter.
Total delay(T del ) is number of delay cell(N) delay of unit cell(T ck ) and delay difference between each point (S1-S4) is identical to T del /4. So if the DLL is locked, T del is equal to the input clock period and the difference of delay between each points is 1/4 of the input clock period(=90 phase difference). Each points have a dummy cell to balance the capacitance loading.(this dummy cell is not described in Figure 2

Phase frequency detector (PFD)
The conventional phase frequency detector is used as shown in Figure 3. This circuit detects a phase error between the input clock and the delayed clock. It is also insensitive to duty cycle because of using a D-flip flop. When the input clock is high, output of the positive edge triggered D-flip flop goes high and generates UP signal. In a similar way, the delayed clock becomes high, output of the D-flip flop is high and generates DOWN signal. Both UP and DOWN signal are high, in that case, both D-flip flops are reset and UP and DOWN signal goes low. When the input phase error is eliminated, UP and DOWN signals make small and same pulse width signal at the same time when the both input clocks become high.  Figure 4 shows the current mismatch reduced charge pump. This circuit converts phase error detected by the PFD into delay-line control voltage. The UPB and DN signals turn-on mosfet and provide current path onto node V ctrl , and charge or discharge to move V ctrl UP or DOWN. For ideal operation, I up and I dn need to be equal. But there are some non-ideality causing current mismatch. To solve these problems, 2 OP-AMPs are used. One problem is charge sharing issue. When UPB and DN are open, node X and Y are biased by the unity-gain amplifier and this structure suppresses charge sharing problem. Another mismatch is due to V ctrl variation. This problem can be decreased by controlling the bias voltage of current source. The OP-AMP senses V ctrl , modulates the bias voltage of I up and therefore current mismatch is reduced.

Clock generator (CG)
The proposed clock generator design is used as shown in Figure 5. The CG has the two input signals with 180 phase difference, and senses the rising edges of the two signals and generates the 50% duty-cycle with 180 phase difference output clocks utilizing these rising edges.
To explain in detail, the input signal becomes high, the pulse generator makes a short pulse only to detect the rising edge regardless of the duty of the input signals. The SDC converts these short pulses to differential pulses. And the differential pulses through the SDC is applied to the different types of SR latch: NAND and NOR type. Table 1 shows the truth tables of SR latches. Since it has the structural difference, a pulse width of the output clocks is different: Figure 6 shows the timing diagram of SR latches. To decrease the difference of the output, the outputs of SR latches are interpolated. As a result, the difference between the rising edge and falling edge is about 1.5 gate delay, and the pulse width of the output clocks is approximately 50% of the input period. Moreover, by using latched comparator lowering the propagation delay, the error is further reduced.

Simulation results
The proposed 4-phase DCC and PC is fabricated using a standard performance 65-nm CMOS process with a 1.2V power supply voltage. Figure 7 shows the layout of the circuits, and the active area is 165 79 . Figure 8 shows the output clock with the 1GHz and 40% dutycycle input clock. The duty-cycle error is less than 5ps and the phase error is less than 3ps. Figure 9 shows the maximum duty cycle error and phase error of the output clock at typical process corner with the different input frequency and different input duty cycles. The maximum duty-cycle error is always smaller than 1% and the maximum phase error is always smaller than 1%. The input frequency of the proposed 4-phase DCC and PC ranges from 550MHz to 1.6GHz, and the input duty-cycle ranges from 25% to 80%. Table 2 shows a performance comparison with prior works. The proposed circuit achieves the 4-phase 50% duty cycle output signals with a wide operation range, a wide duty cycle correction range with low output duty error and phase error and takes less chip area as well as power consumption.

Conclusion
In this paper, proposed 4-phase DCC and PC with a DLL is presented. The proposed DCC and PC circuit not only corrects the duty cycle of the output clock to 50% but also generates 4-phase clock with 90 phase difference. In addition, the proposed scheme can operate across a wide-frequency range and accomplish the DCC and PC despite a wide range of input duty cycle. Thus, it can be  used for duty-cycle correction and phase correction applications, such as the DDR memory, multi-phase I/O interface.