#### **Topics in IC Design**

## 6.1 Introduction to Clock and Data Recovery

Deog-Kyoon Jeong dkjeong@snu.ac.kr School of Electrical and Computer Engineering Seoul National University 2020 Fall

### Outline

- Background
  - Introduction to CDR
  - Jitter characteristics
- Architectures
  - Phase-tracking CDR
  - Blind oversampling CDR

#### Serial links

- A transmitter sends a serialized NRZ data over a single wire or a wire pair.
- A receiver recovers the clock and data from incoming NRZ data stream.
- Since its high-speed capability, it is employed in the high-speed communication over long distance.

Ex) SONET, Gigabit Ethernet, ...

 Nowadays, many specifications utilizing a multichannel serial link continue to appear in the area of backplane communications to meet high bandwidth.

Ex) SATA, PCI-Express, DisplayPort, ...

- Clocking schemes
  - (a) RX has no reference clock.
  - (b) TX and RX have each reference clock and it has a small frequency offset due to device mismatch.
  - (c) TX and RX have the same reference clock.(clock forwarding, source synchronous)



- Jitter generation
  - Definition
    - CDR output jitter measured with jitter-free input data
  - Generally, 0.1UI peak-to-peak and 0.01UI RMS are specified
  - Jitter sources
    - Device noise
    - Ripple on control line
    - Substrate or supply noise

- Jitter transfer
  - Definition
    - Ratio of CDR output jitter to input sinusoidal jitter
  - In the specification of SONET, jitter peaking must be less than 0.1dB
    - Important for transceiver design to restrict jitter accumulation



- Jitter tolerance
  - Definition: Peak-to-peak amplitude of sinusoidal jitter applied on the data input that causes the BER threshold of 10<sup>-12</sup>.
- $JTOL(f)|H_{err}(f)| < 1 UI, H_{err}(f)=|1-H(f)|$
- JTOL(f) < 1/|1 H(f)|



### **CDR Architectures**

- Phase tracking CDR
  - Feedback control
    - ex) PLL-based CDR, DLL-based CDR, CDR with combination of PLL and DLL, and PI-based CDR
- Blind oversampling CDR
  - Operates with the receiver's own clock (no feedback control)
- Topologies without feedback
  - Gated oscillator, high-Q bandpass filter architecture

Basic architecture



- Advantages
  - Input jitter rejection

- Since input is a non-periodic data stream, a PD should be used instead of PFD.
  - Linear PD, Binary PD
  - Full Rate, Half Rate, Quarter Rate

| Linear PD | -Well defined gain<br>-No current when locked<br>-Requires liming amplifier<br>-high-speed data gating                                                             |                                |
|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------|
| Binary PD | -Simple architecture<br>-Easy to adopt parallelism<br>-Inherently locked to optimum<br>sampling position<br>-Unpredictable gain<br>-Large ripple noise when locked | I <sub>P</sub> ↑<br><u>→</u> Ø |

- Hogge PD
  - Inherent data retiming
  - The width of U1's output is dependent on the phase error.
  - The width of U2's output is constant which is half clock cycle wide.
  - In locked situation, it causes a transition-dependent jitter even when the net pumped charge is zero.



Fig. 12. Waveforms of Hogge's detector with clock and data aligned.

© 2020 DK Jeong

**Topics in IC Design** 

- Triwave PD
  - A transition-dependent jitter is eliminated with zero area of the triwave.
  - But, it is sensitive to duty cycle distortion due to unequal weighting.



Fig. 15. Triwave phase detector.



Fig. 16. Waveforms of triwave detector with clock and data aligned.

#### **Topics in IC Design**

#### • Binary PD

- Dual-edge D FF can used as a PD.
- Drawbacks
  - It retains the previous output until the next transition of the data.
  - Since the PD samples the clock by the data, whereas the decision circuit samples the data by the clock, data retiming exhibits significant phase offset at high speed.



- Binary PD
  - Alexander PD (2x oversampling PD)
    - Inherent data retiming
    - Zero DC output in the absence of data transition



- Binary PD
  - Various binary PDs



- Frequency acquisition
  - Since a PD has the limited pull-in range, a frequency acquisition loop is needed especially when a wide-range VCO is used.
  - When only PD is used, a harmonic lock or a false lock can occur as shown below.



- Frequency acquisition with a reference clock
  - In <u>dual VCO locking</u>, coarse control voltage is fed from another frequency (or phase) tracking loop, which is typically TX PLL.
  - In <u>sequential locking</u>, a lock detector determines whether loop is activated.



(a) dual VCO locking



(b) sequential locking

### **Frequency Acquisition**

#### • Frequency acquisition without a reference clock

- A feedback loop with a frequency detector accomplish the locking of the frequency.
- The bandwidth of frequency-locked loop should be much smaller than that of phase-locked loop.



#### **Frequency Detector**

Operation of the frequency detector



#### **Frequency Detector**

Operation of the frequency detector



#### Half-Rate PFD

- In PD, rising and falling edges of the half-rate quadrature clock coincide with data edges. (Very similar to Binary PD)
- In FD, if clock is slow, V<sub>PD1</sub> leads V<sub>PD2</sub>. When V<sub>PD2</sub> is sampled by the rising edge of the V<sub>PD1</sub>, result is negative. When V<sub>PD2</sub> is sampled by the falling edge of the V<sub>PD1</sub>, result is positive.



Figure 5.3.4: Phase and frequency detector.

Figure 5.3.3: Phase detector.

### **Frequency Comparator**

#### Use of two counters

- One generates start/stop, the other counts the clock pulses.
- Determines if frequency difference is within a certain bound.
- Drives the loop within the lock-in range.
- May have a hysteresis to accommodate the difference between lock-in range and capture range.



## **CDR in Analog CDRs**

- Analog blocks start to show limitations in deep-submicron CMOS
  - Severely affected by low supply voltage, low output impedance, leakage, and increased flicker noise.
  - Low supply voltage and poor transistor output impedance aggravate current mismatch in CP and introduces ripple
  - MOS capacitor in LF → large leakage current → large ripple in control voltage → deterministic jitter (pattern jitter)
     \*The area of MIM capacitor, which has no leakage, is x20 larger than that of MOS capacitors @ 90nm.
  - The large loop filter couples substantial amounts of substrate noise into the sensitive control voltage node

[1] P. K. Hanumolu et al, CICC. 2007

**Topics in IC Design** 

### **Advantages of All-Digital CDRs**

- Digital blocks offer advantages.
  - Higher speed with deep submicron CMOS.
  - Less susceptible to short-channel effects.
  - Compact circuit realization.
  - Eliminates the deterministic jitter caused by capacitor leakage and charge pump current mismatch.
  - Loop dynamics, which is set by DLF coefficients, can be easily programmed and are also immune to PVT variations.
  - Good portability for newer processes.
  - DLF eliminates the noise coupling problem.

[1] P. K. Hanumolu et al, CICC. 2007

- ADPLL-based CDR with a digital loop filter
  - An analog loop filter has several limitations in deepsubmicron process.
    - Large leakage in deep submicron process
    - Large area
    - Large capacitance variation
  - Use of a digital loop filter
    - Robust gain without regard to PVT variations
    - Smaller area
  - Architectures
    - Hybrid architecture with VCO
      - Only capacitor is substituted to digital loop filter and resulting integral information is transferred to VCO after lowpass filtering
    - ADPLL architectures with DCO

- A PLL-based CDR is not good solution for multichannel integration
  - Large area for each loop filter
  - Multichannel crosstalk/pulling

#### Features

- If the clock with the same frequency is provided, the DLL can be used for data recovery.
- Multi-channel data recovery with shared input clock.
- It does not work if the frequency offset exists
- No jitter peaking (1<sup>st</sup>-order loop)



© 2020 DK Jeong

### **CDR with combination of PLL/DLL**

- CDR with a 2<sup>nd</sup> order PLL
  - Jitter peaking (For peaking of < 0.1dB,  $\zeta$  should be >4.66)
  - 2<sup>nd</sup> high pole determines the jitter transfer corner frequency and jitter tolerance corner frequency.



## **CDR with combination of PLL/DLL**

#### CDR with combination of PLL and DLL

- Proportional term exists with no zero.
- No jitter peaking
- Jitter transfer corner frequency and jitter tolerance corner frequency is independently controlled.



(a) a shared tracking loop

[D. Dalton] "12.5-Mb/s to 2.7-Gb/s continuous-rate CDR with automatic frequency acquisition and data-rate readback," IEEE JSSC, 2005, Dec.

## **CDR with combination of PLL/DLL**

#### CDR with combination of PLL and DLL

- The previous design does not work with the limited delay range if the initial delay of VCDL is far from the middle of covered range.
- Two loops are independently controlled.



#### **PI-Based CDR**

- Phase Interpolator based CDR
  - It has same dynamics and structure of DLL-based CDR.
  - The delay range is unlimited with the phase-rotator.
  - It works even when the small frequency offset exists
  - It is appropriate for multichannel integration but the routing of multiphase clocks is necessary.
  - No jitter peaking



## **Blind Oversampling CDR**

#### Features

- Feed-forward architecture
- Fast acquisition and inherent stability
- Each received data bit sampled at multiple points
- Sample far from the bit boundaries is selected as the retimed data after the bit boundary is estimated.





#### **Gated-Oscillator-Based CDR**

#### Burst-mode CDR

- The synchronous clock is derived from the gated oscillator which is triggered from the edges of data.
- Fast synchronous clock recovery and data acquisition
- No jitter rejection due to open loop
- Phase alignment is sensitive to PVT variations
- Ex) passive
  optical networks,
  optical packet
  routing system



### **High-Q based CDR**

- Very old version of a CDR circuit
- Since it needs high Q filter, the monolithic integration is difficult.
- Delay unit for maximum sampling margin is necessary and it is typically PVT variant.



## **Topics in IC Design**

# 6.2 Injection Locked Clock Recovery

Deog-Kyoon Jeong dkjeong@snu.ac.kr School of Electrical and Computer Engineering Seoul National University 2020 Fall

**Compliments to MS Chu** 

## Outline

- Why ILO in CDR?
- Recent works
- Summary
- References

# Why ILO in CDR?



[Passive optical networks]

[Broadband correlator]

- Burst-mode CDRs
- PON: passive optical networks
- Broadband correlator


- Direct data injection random sequence
- Frequency acquisition PLL/FLL (FREF, plesiochronous)
- Phase acquisition injection



[JSSC'08 Lee et al.]

- 3 VCOs: large power consumption
- Data sampling point: manually controlled
- Mismatches between 3 VCOs: not concerned
- $\Delta f$  between N·fref & Input data: not concerned

[JSSC'08 Lee et al.]



[Illustration of  $\Delta f$  effects]

[Possible realization]

- $\Delta f$  degrades BER performance
- Possible realization using fref from TX
- The loop cannot track optimum sampling point



- 1 VCO: no mismatch concerns
- Data sampling point: manually controlled
- <u>Af between N·fref & Input data: Not a problem for short</u>
   burst



- Inductive coupling
- No edge detector (XOR gate, bandwidth bottleneck)
- VCTRL from replica PLL (plesiochronous)
- Data sampling point: manually controlled



N-channel parallel inductive-coupling CDRs

# **Tuning Fosc & ΦINJ from BBPD outputs**



- Phase & frequency control from single BBPD
- BBPD pair comparison

# **Tuning Fosc & ΦINJ from BBPD outputs**



#### [JSSC'16 Masuda et al.]

• It solves 2-point modulation problem

# **Tuning Fosc & ΦINJ from BBPD outputs**

#### [JSSC'16 Masuda et al.]



- Capture range is wide enough to claim referenceless CDR
- 1UIpp JTOL @ 120MHz: very large comparing PLL-based

#### References

- [6.2.1] **J. Lee et al.,** "A 20-Gb/s Burst-Mode Clock and Data Recovery Circuit Using Injection-Locking Technique," *JSSC*, 2008.
- [6.2.2] J. Terada et al., "A 10.3 Gb/s Burst-Mode CDR Using a Δ∑DAC," JSSC, 2008.
- [6.2.3] **Y. Take et al.,** "A 30 Gb/s/Link 2.2 Tb/s/mm Inductively-Coupled Injection-Locking CDR for High-Speed DRAM Interface," *JSSC*, 2011.
- [6.2.4] **W.-S. Choi et al.,** "A Burst-Mode Digital Receiver With Programmable Input Jitter Filtering for Energy Proportional Links," *JSSC*, 2015.
- [6.2.5] **T. Masuda et al.,** "A 12 Gb/s 0.9 mW/Gb/s Wide-Bandwidth Injection-Type CDR in 28 nm CMOS With Reference-Free Frequency Capture," *JSSC*, 2016.

# **Topics in IC Design**

# 6.3 Cases of Clock and Data Recovery Circuits

Deog-Kyoon Jeong dkjeong@snu.ac.kr School of Electrical and Computer Engineering Seoul National University 2020 Fall

#### Outline

- Introduction
- Traditional CDRs
  - Analog and Digital PLL
  - PI-based CDR
  - Blind oversampling CDR
  - Hybrid CDR
- All-Digital CDR

## **CDR with Hybrid Loop Filter**

Architecture



 M. H. Perrot et al, "A 2.5-Gb/s multi-rate 0.25-µm CMOS clock and date recovery circuit utilizing a hybrid analog/digital loop filter and all-digital referenceless frequency acquisition," IEEE JSSC, Dec. 2006

[2] M. H. Perrot, JSSC, 2006

## **Hybrid Loop Filter**

- Hybrid loop filter
  - Analog feed-forward path
  - A charge pump I<sub>f</sub> followed by low-pass RC network with a BW of 40MHz. (CDR BW is 1MHz)



## **Hybrid Loop Filter**

- Hybrid loop filter
  - Digital integration path
  - Decimator, digital accumulator, and low-speed sigma-delta DAC (155MHz operation).



### **CDR with Hybrid LF: Features**

- Features
  - Multi-rate operation with low bandwidth and high damping factor

(62.5kHz@155Mb/s, 250kHz@622Mb/s, 1MHz@2.5Gb/s)

- Linear phase-to-digital converter: Hogge PD and sigma-delta ADC.
- hybrid loop filter with small area without large loop-filter capacitor
- Hybrid VCO: LC oscillator with capacitor array and varactor

#### **Conventional PDs (1)**

- Conventional digital PDs and associated signals
  - Alexander PD
    - Bang-bang characteristic leads to highly nonlinear dynamics.



## **Conventional PDs (1)**

- Conventional digital PDs and associated signals
  - Linear BB PD with high oversampling ratio
    - more phase information, but the increased area and power consumption, and high clock loading



[2] M. H. Perrot, JSSC, 2006

## **Conventional PDs (3)**

- Conventional analog PD and associated signal
  - Hogge PD
    - Linear characteristic
    - But, the output is an analog signal, i.e., the pulsewidth is proportional to the phase difference



#### Phase-to-Digital Converter (1)

- Simplified diagram of the proposed phase-to-digital converter
  - Hogge PD +  $1^{st}$ -order  $\Sigma$ - $\Delta$  ADC
  - Linear characteristic and digital output
  - Amp lowers the number of metastable events.



[2] M. H. Perrot, JSSC, 2006

#### Phase-to-Digital Converter (2)

- Detailed diagram of the proposed phase-to-digital implementation
  - Buffer compensates the clk-to-Q delay of the 1<sup>st</sup> register.
  - Intermediate latch achieves equal loading for the register and latchs feeding into XOR gates.



[2] M. H. Perrot, JSSC, 2006

#### **CDR with Hybrid LF: Hybrid VCO**

- Hybrid analog/digital VCO and its control
  - Hybrid analog/digital VCO is required since the process is an old 0.25µm process.
  - All-digital implementation with no varactor is attractive at 130nm and 90nm CMOS.



#### **ADPLL** with **BB-PD**

Architecture



• P. K. Hanumolu et al, "A 1.6Gbps digital clock and data recovery circuit," IEEE CICC, 2006.

#### **ADPLL with BB-PD: Architecture**

Detailed Architecture



- Full-rate architecture with a BB PD (Alexander PD)
- 5-level DAC (PDAC+IDAC)
- Adder is operated at a quarter-rate with a decimated input
- The proportional and integral paths are split.
  - It minimizes loop latency of main proportional path and dithering jitter

Dithering Jitter ~2T<sub>latency</sub>·K<sub>P</sub>·K<sub>DCO.</sub>

[3] P. K. Hanumolu, CICC, 2006

**Topics in IC Design** 

#### **ADPLL with BB-PD: DLF**

- Digital Loop Filter
  - Proportional path
    - The PD output (3-level signal: early, late, no transition) from the BB PD is directly transferred to a current-mode DAC (PDAC).
  - Integral path
    - The PD output is integrated with a 14-bit accumulator operating at a quarter rate, after deserialization and majority voting.
    - 11 MSBs of the 14-bit output are truncated to 3-levels using a 2<sup>nd</sup>-order DSM and then transferred to a current-mode DAC (IDAC).
  - Over-damped response:

PhaseChange<sub>proportional</sub> /PhaseChange<sub>integral</sub> > 1000

#### **ADPLL with BB-PD: DCO**

- DCO: current-mode DAC + VCO
  - Load resistance in delay elements is varied by DACs.
  - Fine control voltage  $V_F$  is controlled in 5 steps.
  - Coarse control voltage V<sub>c</sub> is controlled off-chip.



#### All-Digital CDR with DCO

Overall Architecture



• D.-H. Oh et al, "A 2.8Gb/s all-digital CDR with a 10b monotonic DCO," IEEE, ISSCC, 2007.

#### **ADCDR with DCO: Features**

Overall block diagram



- Fully-implemented ADCDR circuit.
- Wide-range DCO with 10-bit integral code.
- Full-rate architecture with a BB PD (Alexander PD).
- The proportional and integral paths are split.

## **ADCDR with DCO: Monotonic DCO**

- DCO overview
  - Supply-regulated inverter-based ring oscillator with a digitally-controlled resistor
  - Resistance is controlled in a wide range.
  - Split-tuned control for integral and proportional paths
  - Glitch reduction scheme in a integral path



## Monotonic DCO(1)

- Digitally-Controlled Resistor
  - Resistors are implemented with PMOS Transistors.
  - 1024 PMOS transistors are sequentially turned on for monotonic characteristic according to a 10-bit integral word.
  - 10-bit binary code → row and column thermometer codes → 1024 thermometer codes.



[4] D.-H. Oh, ISSCC, 2007

## **Monotonic DCO(3)**

- Insert a vertical resistor between rows
  - Reduce  $f_{step}$  when the control code is small.



[4] D.-H. Oh, ISSCC, 2007

## **Glitchless DCO (1)**

- Glitch problem
  - When a row code changes, glitch can be occurred by the delay mismatch between the row code and column code.



(row=4, column=32)

**Topics in IC Design** 

22

## **Glitchless DCO(2)**

- Glitch-less switching
  - As a code increases, the column code increases when the row is even, and it decreases when the row is odd
  - Even rows turn on if the column code is '1'.
  - Odd rows turn on if the column code is '0'.



#### **Glitchless DCO(3)**



– Only one PMOS turns on/off simultaneously.

[4] D.-H. Oh, ISSCC, 2007

# **Direct Forward Path**

- Proportional path is split to minimized latency
  - Bit-rate speed, one gate-delay latency
  - Only 2-bit accuracy: -1 / 0 / +1
  - Directly forwarded to DCO



#### **Blind Oversmpling CDR**

Architecture



• B.-J. Lee et al, "A quad 3.125Gbps transceiver cell with all-digital data recovery circuits", IEEE SOVC, 2005.

© 2020 DK Jeong

**Topics in IC Design**
## **Blind Oversampling CDR**

- Architecture
  - The receiver except sampling latches is described in HDL and synthesized.
  - Performance immune to device noise and process-dependent parameters
  - The receiver operates in a local-clock domain.
  - The frequency offset between incoming data and local clock is compensated by controling the number of data bits recovered in a single cycle.

## **Digital Phase Tracker (1)**

- Blind oversampling receiver
  - First samples the data blindly, and then extracts the phase information by post-processing the sampled data.



Bit boundary detection in blind oversampling receiver

- 3x-oversampling with  $Ø_1, Ø_2, Ø_3$
- Bit boundary location is found by averaging the edge-detect information

[8] B.-J. Lee, SOVC, 2005

## **Digital Phase Tracker (2)**

- Oversampling ratio
  - DJ and RJ in the XAUI spec: 0.55UI
  - the maximum achievable JTOL increases with higher OSR
    - The maximum achievable JTOL of 3x oversampling : 0.45UI-0.33UI = 0.12UI
  - 0.02UI margin is too marginal (high frequency JTOL in XAUI > 0.1UI)
  - Thus, OSR of 5 is chosen.



Fig. 3 (a) High-frequency JTOL of 3-x oversampling receiver, (b) maximum achievable JTOL for various OSR.

[8] B.-J. Lee, SOVC, 2005

**Topics in IC Design** 

## **Digital Phase Tracker (3)**

#### Phase averaging window

- The larger number of phase samples
- Sharpens the jitter PDF, improves the correct phase detection probability (P<sub>0</sub>)
- results in better receiver performance at the expense of more hardware resources.
- 50~150 phase samples are required to achieve BER of less than 10<sup>-12</sup>.



Fig. 4 (a) Phase detection through phase averaging, (b) calculated BER versus averaging window size.

[8] B.-J. Lee, SOVC, 2005

## **Digital Phase Tracker (4)**



Fig. 5 Phase tracker block diagram

[8] B.-J. Lee, SOVC, 2005

### References

- [1] P. K. Hanumolu et al, "Digitally-enhanced phase-locking circuits," IEEE CICC, 2007
- [2] M. H. Perrot et al, "A 2.5-Gb/s multi-rate 0.25-µm CMOS clock and date recovery circuit utilizing a hybrid analog/digital loop filter and all-digital referenceless frequency acquisition," IEEE JSSC, Dec. 2006.
- [3] P. K. Hanumolu et al, "A 1.6Gbps digital clock and data recovery circuit," IEEE CICC, 2006.
- [4] D.-H. Oh et al, "A 2.8Gb/s all-digital CDR with a 10b monotonic DCO," IEEE ISSCC, 2007.
- [5] H. Song et al, "1.0-4.0-Gb/s all-digital CDR with 1-ps period resolution DCO and adaptive proportional gain control," IEEE JSSC, to be published
- [6] H. Lee et al, "Improving CDR performance via estimation", IEEE ISSCC, 2006.
- [7] S. Sidiropoulos et al, "A semidigital dual delay-locked loop", IEEE JSSC, Nov. 1997.
- [8] B.-J. Lee et al, "A quad 3.125Gbps transceiver cell with all-digital data recovery circuits", IEEE SOVC, 2005.

## **Topics in IC Design**

# 6.4. Baud-Rate Timing Recovery

Deog-Kyoon Jeong dkjeong@snu.ac.kr School of Electrical and Computer Engineering Seoul National University 2020 Fall Compliment to Moon-Chul Choi

## Outline

- Introduction
- Baud-Rate CDRs
  - Mueller-Müller (MM) CDR
  - Sign-Sign Mueller-Müller (SS-MM) CDR
  - Other Baud-Rate CDRs

## What is Baud-Rate CDR?

• Use only one phase of CLK per bit





### 2x-over samp. CDR vs. Baud-Rate CDR



In high speed I/Os, baud-rate CDR is preferable to reduce clocking power.

## **Baud-Rate CDRs**

- Mueller-Müller CDR
  - Requires elaborate ADC => increase complexity
- Sign-Sign Mueller-Müller CDR (SS-MM CDR)
   MM timing function can be simplified
- Other Baud-Rate CDRs

## **Mueller-Müller Timing Function**

#### <Principle of MM timing recovery>



#### **Received signal**

$$x(t) = \sum_{m} A_{m}h(t - mT_{b})$$

k<sup>th</sup> sample at t=kT<sub>b</sub>+ $\tau_k$ 

$$x_{k} = x(kT_{b} + \tau_{k}) = \sum_{m} A_{m}h[(k-m)T_{b} + \tau_{k}] = \sum_{i} A_{k-i}h(iT_{b} + \tau_{k})$$

<Block diagram>



Multiply  $x_k$  with  $A_{k-1}$  and assume independent and equiprobable data  $E[x_kA_{k-1}] = \sum_i E[A_{k-i}A_{k-1}h(iT_b + \tau_k)] \approx A^2h(\tau_k + T_b)$ Similarly,  $E[x_{k-1}A_k] \approx A^2h(\tau_k - T_b)$  $E[x_kA_{k-1} - x_{k-1}A_k] \approx A^2[h(\tau_k + T_b) - h(\tau_k - T_b)]$ 

[K. Mueller, TC 76']

## Sign-Sign Mueller-Müller CDR

- MM timing function can be simplified.
- Sign-Sign MM: Instead of ADC,
  - Two sampled binary results:  $\text{Sign}(x_k A_k)$  and  $\text{Sign}(x_{k-1} A_{k-1})$  are used.  $E[x_k A_{k-1} - x_{k-1} A_k] \approx A^2[h(\tau_k + T_b) - h(\tau_k - T_b)]$
  - *timing function*  $\tau = x_k A_{k-1} x_{k-1} A_k = x_k A_{k-1} A_k A_{k-1} x_{k-1} A_k + A_k A_{k-1}$

$$= (x_k - A_k)A_{k-1} - (x_{k-1} - A_{k-1})A_k$$
  

$$\cong \text{Sign}(x_k - A_k)A_{k-1} - \text{Sign}(x_{k-1} - A_{k-1})A_k$$

- Note: MM timing function is valid whether transition is present or absent.
- For a rising transition  $(A_{k-1} = -A_{ref}, A_k = +A_{ref}), \tau = -Sign(x_k A_{ref}) Sign(x_{k-1} + A_{ref}),$ where  $Sign(x_k - A_{ref})$  is an error sampler output with ref voltage of  $A_{ref}$  and  $Sign(x_{k-1} + A_{ref})$  is another error sampler output with ref voltage of  $-A_{ref}$ .

## Case Study: SS-MM CDR [1/5]



[F. Spagna, ISSCC 10']

## Case Study: SS-MM CDR [2/5]

Direct feedback method: select Vref or –Vref depending on the past data



Reduce one V<sub>REF</sub> sampler => Samples/bit: 2 (data, error)

2020-11-13

However, feedback loop becomes a bottleneck in high-speed operation.

**Topics in IC Design** 

## Case Study: SS-MM CDR [3/5]

Comparison w/ SS-MM PD Scheme



## Case Study: SS-MM CDR [3/5]

#### Phase Detection Operation



**Topics in IC Design** 

## Case Study: SS-MM CDR [4/5]

#### • SS-MM CDR with DFE



- MM-CDR forces s(-1) = s(1) and DFE forces s(1)=0
- CDR lock shifts left
   => reduced margin
  - Add digital offset proportional to DFE c(1)

# Case Study: SS-MM CDR [4/5]

• When DFE fully cancels post-cursors with remaining pre-cursor,



## **Baud-Rate CDRs**

- Mueller-Müller CDR
  - Requires elaborate ADC => increase complexity
- Sign-Sign Mueller-Müller CDR (SS-MM CDR)
   MM timing function can be simplified
- Other Baud-Rate CDRs

## Case Study: Other Baud-Rate CDRs [1/5]

• Detect phase from 1-tap speculative DFE







- # of samplers: 2/3
- # of clock phases: 1/2

[T. Shibasaki, ISSCC 16']

## Case Study: Other Baud-Rate CDRs [1/5]







Data

decision

Early-late detection

Data

decision

Early-late

detection

DH

DL

DH

DL

## Case Study: Other Baud-Rate CDRs [1/5]





[T. Shibasaki, ISSCC 16']

## Case Study: Other Baud-Rate CDRs [2/5]

Integration & reset based baud-rate CDR



[J. Han, JSSC 17']

## Case Study: Other Baud-Rate CDRs [2/5]

Phase detection operation



[J. Han, JSSC 17']

## Case Study: Other Baud-Rate CDRs [3/5]

Sub-baud-rate CDR (integration based)



TABLE I: Summary of hardware requirements.

| Architecture                    | 2x oversampling | Mueller-Muller<br>(Baud-rate) | This work<br>(Sub-baud-rate) |
|---------------------------------|-----------------|-------------------------------|------------------------------|
| # of clock phases<br>per symbol | 2               | 1                             | 0.5                          |
| # of samplers per<br>symbol     | 2               | 3                             | 2                            |



[D. Kim, JSSC 19']

## Case Study: Other Baud-Rate CDRs [3/5]

Recover d[4k+2] by using d[4k+1] & d[4k+3] & V<sub>I</sub><sup>E</sup>



## Case Study: Other Baud-Rate CDRs [3/5]

#### Phase detection operation



[D. Kim, JSSC 19']

## Case Study: Other Baud-Rate CDRs [4/5]

 2x half-baud-rate CDR combines the benefits of the 2x oversampling BBPD and MMPD (robustness and power saving)



4 samples in every other UI

#### On average, 2 samples/bit

[D. Yoo, CICC 19']

## Case Study: Other Baud-Rate CDRs [4/5]

Proposed PD is less sensitive to the Vref offset and residual ISI compared to the MMPD



## Case Study: Other Baud-Rate CDRs [5/5]

• SSMMSE PD in PAM4



$$\tau_{k+1} = \tau_k + \theta_{bb} \operatorname{sgn}(e_k) \operatorname{sgn}\left(\frac{\delta y(kT_b + \tau_k)}{\delta \tau_k}\right)$$

$$Early = D(e \odot s); Late = D(e \oplus s)$$



Figure 1.12: 4-PAM eye

| Table 1.1: SSMMSE PD Truth Table |       |       |             |  |  |
|----------------------------------|-------|-------|-------------|--|--|
| Sampling Point                   | Error | Slope | PD decision |  |  |
| (Fig. 1.12)                      | (e)   | (s)   |             |  |  |
| А                                | 0     | 0     | Early       |  |  |
| В                                | 0     | 1     | Late        |  |  |
| D                                | 1     | 0     | Late        |  |  |
| Е                                | 1     | 1     | Early       |  |  |

25

### References

[1] K. Mueller *et al.*, " Timing Recovery in Digital Synchronous Data Receivers," Transactions on Communications, May 1976.

[2] F. Spagna *et al.*, "A 78mW 11.8Gb/s serial link transceiver with adaptive RX equalization and baud-rate CDR in 32nm CMOS," ISSCC, 2010.

[3] Y. Kim *et al.*, "A 10-Gb/s Reference-Less Baud-Rate CDR for Low Power Consumption With the Direct Feedback Method," TCASII, Nov. 2018.

[4] M. Choi *et al.*, "A 0.1-pJ/b/dB 28-Gb/s Maximum-Eye Tracking, Weight-Adjusting MM CDR and Adaptive DFE with Single Shared Error Sampler," VLSI, 2020.

[5] T. Shibasaki *et al.*, "A 56Gb/s NRZ-electrical 247mW/lane serial-link transceiver in 28nm CMOS," ISSCC, 2016.

[6] J. Han *et al.*, "Design Techniques for a 60-Gb/s 288-mW NRZ Transceiver With Adaptive Equalization and Baud-Rate Clock and Data Recovery in 65-nm CMOS Technology," JSSC, Dec. 2017.

[7] D. Kim et al., "A 15-Gb/s Sub-Baud-Rate Digital CDR," JSSC, Mar. 2019.

#### References

[8] D. Yoo *et al.*, "A 30Gb/s 2x Half-Baud-Rate CDR," CICC, 2019.
[9] F. Musa and A. Chan Carusone, "Clock recovery in high speed multilevel serial links," ISCAS, 2003.