# A fine time-resolution (< 3 ps-rms) Time-to-Digital Converter for Highly Integrated Designs

Lukas Perktold Graz University of Technology Rechbauerstr. 12, 8010 Graz, Austria

Abstract—A multi-channel 3-ps-rms single-shot precision timeto-digital converter (TDC) is presented. The time interpolation is based on a delay-locked-loop (DLL) employing resistive interpolation to achieve least-significant-bit (LSB) sizes as small as 5 ps. To calibrate out device mismatches, only the timing-reference signals need to be calibrated. The usual need for calibrating each channel individually is avoided. After calibration, the measured differential-non-linearity (DNL) and integral-non-linearity (INL) are  $\pm 0.9$  LSB and  $\pm 1.3$  LSB respectively. A prototype, implemented in a commercial 130 nm technology, consumes between 34 mW to 42 mW/channel and shows a voltage sensitivity of -0.19 ps/mV and a temperature dependence of 0.44 ps/°C. To the best of our knowledge this is the first time a TDC demonstrates single-shot precisions on multiple channels smaller than 3 ps-rms.

# I. INTRODUCTION

Time measurements have long been used in medical imaging applications [1], biological research [2] and high energy physics experiments [3]. In positron emission tomography (PET), for example, high-precision time measurements allow to precisely determine the origin of emission of a photon. In such systems, to cover the full area of interest, several hundreds or thousands of channels are required.

Recently, new particle sensors achieving time precisions in the 10 ps-rms regime were reported, [4] and [5]. To fully exploit the capabilities of such new sensors, time-to-digital converters with very fine single-shot precision are essential. In most recent TDC designs, time resolutions in the 1 ps domain on a single channel have been achieved, [6] or [7]. At the level of a few ps resolution, calibration mechanisms to calibrate out device mismatches have become necessary. Such calibration mechanisms often operate locally and require each single channel to be calibrated. In multi-channel TDC architectures, per channel calibration can represent a considerable circuit overhead and a time intensive task during production.

In this paper we will present an architecture that overcomes the usual limitations by employing a global interpolation concept based on a DLL and resistive interpolation. No local calibration mechanism on a per channel basis is needed requiring calibration to be applied only on a larger group of channels. Important design trade-offs as well as measurement results of a prototype chip, implemented in a commercial 130 nm technology are presented and discussed. Jørgen Christiansen CERN CH-1211 Geneva23, Switzerland



Fig. 1. Proposed multi-channel TDC architecture. The architecture is designed to reduce the calibration effort in a multi-channel TDC. LSB sizes as small as 5 ps can be achieved.

## II. ARCHITECTURE

A block diagram depicting the basic architecture of the TDC is shown in figure 1. An external reference clock (*REFCLK*) serves as the time base of the TDC. This reference clock is fed to a global fine-time interpolator stage as reported in [8] as well as to a coarse counter that counts whole clock cycles. The fine-time interpolator and the state of the coarse time counter are distributed to the time capture registers. Several channels are grouped into segments and served by dedicated distribution buffers to provide sufficient driving capability. This achieves sharp signal edges and avoids degradation due to the RC-delays of the wires. A calibration circuit to calibrate out device mismatches is included in the distribution buffers. On the arrival of a *HIT*, the counter value as well as the fine-time-code are latched to precisely capture the time of arrival of the event.

For large systems implementing hundreds or thousands of channels, it is essential to keep the whole system synchronized to one single reference. The presented architecture can easily be scaled to a larger system by distributing one common reference to a all the TDCs. A DLL in the interpolator is employed to adjust the fine-time interpolator to automatically track process-voltage-temperature (PVT) variations. The size



Fig. 2. The fine-time interpolator architecture generating the 128 signals of the fine-time code. A two stage interpolation concept is pursued. An interpolation factor of N = 32 in the  $1^{st}$  stage and M = 4 in the  $2^{nd}$  stage is employed.

of one LSB only depends on the input reference clock frequency and can be varied by changing the interpolator's input frequency. This *auto-adjust* feature does not only guarantee the same timing performance across large systems but also offers the possibility to trade-off power consumption and time resolution allowing to match the timing performance of the TDC precisely to the system's needs.

## A. Fine-Time Interpolator

The fine-time interpolator is responsible for generating a set of uniformly distributed signals. A block diagram of the finetime interpolator is shown in figure 2. The time interpolation is accomplished in two stages. In a first stage the reference clock is subdivided into N = 32 smaller time intervals. In the second stage resistive time interpolation [9] is used to overcome the gate-delay limitation of the technology by further dividing the generated time intervals of the 1<sup>st</sup> stage by a factor of M = 4. In total a set of 128 uniformly distributed signals are generated, referred to as the fine-time-code of the TDC. A DLL is responsible for adjusting the delays of the finetime-code signals to precisely cover one complete reference clock period. Variable LSB sizes ranging from 20 ps down to 5 ps can be generated by adjusting the input frequency from 390.625 MHz up to 1562.5 MHz.

#### B. Time Capturing

To capture the time of arrival of a hit, the code of the fine-time generator and the counter state are latched into the time capture registers. Figure 3 shows the basic architecture of this concept. The hit signals are connected to the clock input of the register. The fine-time code as well counter values are connected to the D-input. As the hit initiates the time capturing process the counter can be switching its state during the capturing of a hit. This might cause the registers to latch an invalid counter code. To resolve the issue a second counter, clocked on the negated clock phase, can be used. Looking at the fine-time-code the correct count value can be identified.

A timing diagram illustrating the latching process of the fine-time code is shown in figure 4. In the illustrated example the DLL has an interpolation factor of N = 8 whereas the second stage further subdivides the time intervals by a factor



Fig. 3. The time of arrival of a hit is captured by latching the fine-time code and the counter state into the time capturing registers.



 $T_{meas} = 5 * LSB + LSB / 2$ 

Fig. 4. Operating principle of the time capturing process.

of M = 4. The smallest time interval is referred as to be the LSB of the TDC. On the arrival of the hit the state of all the bins are latched into the timing capturing registers generating a digital code from which the precise time of arrival of the hit can be extracted. The measured time is referenced to *REFCLK* and is assigned to the middle of the detected bin, equating to 5.5 LSBs in our example.

The time capture registers have been custom designed and were optimized for timing, area and power consumption. Figure 5 shows the schematic of the register and table I lists its dimensions. On the low level of the *clk* input the feedback path of the  $1^{st}$  latch of the register is cut. This causes the  $1^{st}$  latch to follow its input *D*. On the rising-edge of the *clk* the state of the input *D* is latched.

## **III. TIMING PRECISION**

The single shot precision of a TDC is usually evaluated by generating a completely flat distribution of events over time. It can be estimated and written in terms of a set of uncorrelated contributions as

$$\sigma_{rms} = \sqrt{\sigma_{qDNL}^2 + \sigma_{wINL}^2 + \sigma_{clk}^2 + \sigma_{tdc}^2 + \sigma_{hit}^2}.$$
 (1)

Thereby,  $\sigma_{qDNL}$  represents the rms value of the quantization error evaluated for each specific bin weighted by its probability  $\rho_i$  to get hit as denoted by

$$\sigma_{qDNL} = \sqrt{\frac{1}{N} \cdot \sum_{i=0}^{N-1} \left(\frac{LSB_i}{\sqrt{12}} \cdot \rho_i\right)^2},$$
 (2)

where N equals the number of bins within the DLL (e.g. 128). The weighting factor  $\rho_i$  accounts for the likelihood of a bin being hit due to its size. Wider bins are more likely to be hit and thus, will contribute more significantly to the timing precision of the TDC than smaller ones. As wider bins provide a weaker timing precision and collect more hits at the same time, the



Fig. 5. Schematic diagram of the time capture registers. The  $1^{st}$  latch is optimized for timing.

 TABLE I

 Device dimensions of the time capture register

| Device    | Width  | Length              |
|-----------|--------|---------------------|
| T1 - T7   | 3 µm   | $0.12\mu\mathrm{m}$ |
| T8 - T14  | 1.5 μm | $0.12\mu\mathrm{m}$ |
| T15 - T19 | 1 µm   | $0.12\mu\mathrm{m}$ |
| T20 - T24 | 0.5 μm | $0.12\mu{ m m}$     |

impact of wider bins on the final precision is increased. The contribution of the INL as represented by  $\sigma_{wINL}$ , can be taken into account in similar manner by calculating the rms variation of the INL whereby each bin is weighted by its probability to get hit ( $\rho_i$ ). The INL contribution includes LSB mismatches coming from process variations as well as LSB variations due to *REFCLK* synchronous noise on the power supply. The jitter resulting form the DLL reference clock, the thermal / supply noise of the circuitry and jitter present on the hit signal itself are denoted as  $\sigma_{clk}$ ,  $\sigma_{tdc}$  and  $\sigma_{hit}$  respectively.

## A. Device Mismatch

With a 5 ps LSB size the best case precision that can be achieved is as low as 1.44 ps-rms. Careful post-layout monte-carlo simulations have been run to balance out device mismatch and power consumption. The fine-time-interpolator shows a 1 sigma variation of the DNL of 1.24 ps-rms. This is the rms average of the 1 sigma variation of the LSB size of several different runs. Circuit diagrams and device sizes of the interpolator structure have been reported in [8]. The simulated 1 sigma variation of the DNL at the level of the output of the distribution buffers was estimated to be 1.76 ps. This also includes the mismatch resulting from the fine-time interpolator.

Special care was taken to dimension the time-capture registers to achieve good timing performance in the presence of device mismatch. To analyze the mismatch of the timing registers the signal connected to the D-input of the registers was swept in 0.1 ps steps across the switching point. This allowed the timing uncertainty of a single register in the



Fig. 6. Capacitive loading is used to calibrate out mismatches introduced by the fine-time interpolator and the buffers distributing the fine-time-code.

presence of device mismatch to be identified and the time capturing point of the register to be monitored. Just the first latch of the registers defines its time capturing characteristics and decides whether a logic one or logic zero is registered. The propagation delay of the input to output is not of relevance and can have larger deviations. The time-capture registers have been designed to keep the 1 sigma variation of the time capturing point as low as 1.31 ps. As the size of one LSB is defined by two edges, this value needs to multiplied by  $\sqrt{2}$  to obtain the expected 1 sigma variation of the DNL which equates to 1.85 ps in our case.

# B. Calibration

From monte-carlo simulations, DNL deviations as large as  $\pm 9$  ps from the nominal value are expected to result from device mismatches in the interpolator and the distribution buffers. This can lead to missing codes in the transfer characteristic of the TDC and would lead to a degradation in timing precision. To calibrate out these mismatches a variable capacitive load as depicted in figure 6 is included in the distribution buffers. A total of 64 fF in 2 fF steps can be added. This allows each signal of the fine-time-code to be delayed by  $\pm 16$  ps in 1 ps steps. Such a large adjustable range allows not only to compensate for DNL errors but also allows to correct for INL errors up to 6.4 LSBs. To sustain sharp signal edges at the output of the buffers, the calibration is applied at internal nodes of the buffers.

The calibration circuit acts on a set of channels and can correct for variations resulting from the fine-time interpolator only. Variations resulting from the timing capturing registers are not corrected. The time-capture registers have been designed to avoid the introduction of large code mismatches.

## C. Jitter

Whereas device mismatches can very well be approximated during design time, the amount of jitter introduced into the system is very difficult to quantify in advance. The jitter introduced by the TDC depends on the circuit activity as well as the susceptibility to noise on the power rails. Thereby, the amount of jitter introduced is proportional to the signal's propagation delay and its slew rate. For a robust design, short signal delay paths and sharp signal edges for timing critical signals, i.e. the DLL reference clock, the fine-time code signals and the hit signals, are crucial. The proposed architecture optimizes both, the propagation delay as well as its slopes for all of the time critical signals. All the interpolation steps are accomplished globally by the fine-time interpolator. This avoids the need to additionally delay the hit signals to generate the smallest LSB sizes and allows to keep the propagation delay path of the hits at a minimum. The hit signals only need to propagate through circuitry with a total propagation delay of smaller than 600 ps. This includes the complete signal propagation path starting from the i/o receiver down to the clock input of the registers. To distribute the hit to the array of 128 TCRs a balanced h-tree fan-out structure with intermediate buffer stages to keep sharp signal edges is employed. The buffers have been designed not to introduced any additional mismatch between bins ( $\sigma = 0.2$  ps-rms).

The presented architecture employs fast delay cells in the DLL, as described in [8], to allow the propagation delay of the DLL to be kept as low as possible. At the highest reference clock frequency the maximum delay introduced by the DLL is as low as 640 ps. Additionally, a propagation delay of approximately 300 ps for i/o, buffering and signal conversion is introduced by the interpolator. For distributing the fine-time code signals strong signal drivers are employed, each stage introducing less than 100 ps of delay.

The 1 sigma cycle-to-cycle jitter due to thermal noise introduced by the 32 delay buffers of the DLL was simulated to be below 1 ps-rms. To further reduce the susceptibility to noise on the power rails the DLL is built fully differentially and the fine-time interpolator as well as the channel matrix are running on their own power domain making use of substrate isolation features of the technology.

#### **IV. EXPERIMENTAL RESULTS**

A prototype chip has been fabricated in a commercial 130 nm technology and has been successfully tested with promising results. The fine-time code generator and a total of 8 channels with different configurations to evaluate effects on device mismatch, input buffer architecture and time capturing concept have been implemented. The results reported here make use of the time capture register from figure 5, employ both implemented types of input buffers for receiving the hit and are based on the time capturing concept as reported in section II-B. A simple serial readout to read-out the captured data is integrated in the channel matrix. A micro-photograph of the chip wire-bonded to the carrier PCB is shown in figure 7. The interpolator is supplied by a 1.3 V power supply and operated with a 1562.5 MHz clock. Nominally, LSB sizes of 5 ps are generated. A summary of the achieved performance is listed in table II. The high precision reference clock for the TDC is provided by a low jitter clock source (SRS CG635) with a measured cycle-jitter of better than 1 ps-rms.

# A. Nonlinearity of the Interpolator

To test the nonlinearity performance of the interpolator a code density test using  $10^5$  hits per channel was performed. Based on the extracted histograms, the mismatch from the interpolator was calibrated out employing the global on-chip calibration circuit. An example of a code density test of a



Fig. 7. The micrograph of the prototype produced in a commercial 130 nm technology. The fine-time interpolation, the distribution buffers & calibration as well as the time-capture registers of 8 channels are shown.

| TABLE II                      |     |
|-------------------------------|-----|
| PERFORMANCE SUMMARY OF THE TI | DC. |



Fig. 8. Code density test of channel A employing  $10^5$  hits. All 128 bins of the interpolator are shown: (a) before global calibration and (b) after global calibration.

channel for all the 128 bins of the interpolator before and after calibration is shown in figure 8a and 8b respectively. The extracted DNL and INL of different channels after global calibration are shown in figure 9 and figure 10 respectively. A DNL of better than  $\pm 0.9$  LSB and a INL of better than  $\pm 1.3$  LSB were achieved.



Fig. 9. Measured DNL of all 128 bins of the interpolator after global calibration. The boxes list the 1 sigma distribution of the DNL as well as the weighted quantization error calculated after equation (2).

After calibration ideally only the device mismatch coming from the time-capture registers should be present. From simulation the estimated 1 sigma value of the DNL resulting from the time-capture registers was 1.85 ps-rms. The measured 1 sigma value of the DNL after calibration is better than 1.4 ps which is slightly better than the expected result from simulation.

# B. Single-Shot Precision

To evaluate the single-shot precision of the TDC, one single hit with a constant phase difference is sent to two distinct channels. The phase difference is varied by means of different wire lengths. Fast input buffers have been employed for the receiving of the hit signals. To guarantee that the phase delay does not fall into a period of the INL and to evaluate the performance over longer measurement cycles, the measurement is repeated with different phase delays. As the hit is generated in a random manner it falls into any bin of the interpolator, always generating the same difference in bin



Fig. 10. Measured INL of all 128 bins of the interpolator after global calibration. The boxes list the mean free 1 sigma distribution of the INL as well as its weighted rms error calculated in a similar manner to equation (2).

counts. Using this method eliminates the jitter contribution of the hit ( $\sigma_{hit} = 0$  ps-rms) as the same hit signal is fed to both active channels. All the other contributions as listed in equation (1) are included in the measurement. To obtain the single-shot precision of the TDC, the measured result needs to be scaled by  $1/\sqrt{2}$ . This is because the timing uncertainties of two channels are contained in the measurement.

Three different series of measurements where performed. (1) Hits arriving within one period of the DLL reference clock; (2) hits with an offset of greater than one DLL reference clock and (3) hits separated by an offset of 5 ns. Different wire lengths are used to generate propagation delay differences from 0 ps to 5 ns. Wire differences of zero, one, two, four, eight and 39.4 inches are used. The obtained results are shown in figure 11.

Across the whole measurement series a double shot precision of better than 3.45 ps-rms was achieved. This equates to a single-shot precision of better than 2.44 ps-rms. The jitter measurement was performed on the channel pair C & D. The



Fig. 11. Measured double shot precision of the TDC for different wire delays (0, 1, 2, 4, 8 and 39 inch) in LSB. To obtain the single shot precision in ps, *Sigma* needs to be multiplied by  $5 \text{ ps}/\sqrt{2}$ .



Fig. 12. Measured delay variations due to voltage and temperature shifts. The measurements have been performed at nominally 1.3 V,  $31 \degree \text{C}$  operating temperature and with a 1280 MHz reference clock frequency (REFCLK).

expected timing precision due to the quantization noise and the measured non-linearities of the interpolator is calculated after equation (3) to be 2.34 ps-rms and 2.53 ps-rms respectively and is well in line with measurement results.

$$\sigma_{qDNL/wINL} = \sqrt{\sigma_{qDNL}^2 + \sigma_{wINL}^2} \tag{3}$$

The parameters to calculate  $\sigma_{qDNL/wINL}$  are listed in figure 9 and figure 10. The jitter contribution of the TDC circuit ( $\sigma_{tdc}$ ) as well as the DLL reference input clock ( $\sigma_{clk}$ ) were estimated by simulation and measurement to be approximately 1 ps-rms and are included in the measurement.

## C. PVT robustness

PVT variations can result in a constant delay shift between the fine-time code signals derived from the DLL reference clock and the hit signals. The delay variation, employing the slow hit input buffers, in presence of temperature and voltage variation is depicted in Figure 12. A voltage sensitivity of -0.19 ps/mV and a change in delay due to temperature changes of 0.44 ps/°C has been observed and are bin independent. The delay of the hit has been arbitrarily adjusted to fall into bin 98, equating to 598 ps.

#### D. Power Consumption

The power consumption of the prototype has been evaluated for different DLL reference clock frequencies and operating conditions. The total power consumption with 5 ps LSB sizes with all the channels in acquisition mode is 335 mW. This equates to 42 mW/channel. 23% are consumed by the i/o of the prototype and 18% are consumed by the fine-time interpolator. Most of the power, namely 59%, is consumed by the switching of the time capturing registers and the distribution buffers to distribute the fine-time code. For data acquisition, the first latch of the time capture registers needs to be transparent to capture the time of arrival of a hit. This causes additional power to be consumed during acquisition. For low hit rate applications the power consumption can be decreased by keeping the acquisition time per channel at a minimum. In this case the total power consumption of the TDC can be reduced to 268 mW, equating to 34 mW/channel.

The interpolator can also be operated at lower frequencies. This results in larger LSB sizes and leads to less time precision at lower power consumption. At 10 ps LSB sizes the total power consumption is 204 mW (26 mW/channel) and 169 mW (21 mW/channel) in case of continuously running and stopped acquisition respectively.

#### REFERENCES

- D. Schwartz, E. Charbon, and K. Shepard, "A single-photon avalanche diode array for fluorescence lifetime imaging microscopy," *Solid-State Circuits, IEEE Journal of*, vol. 43, no. 11, pp. 2546 –2557, nov. 2008.
- [2] S. Mandai and E. Charbon, "A 128-channel, 9 ps column-parallel twostage TDC based on time difference amplification for time-resolved imaging," in *ESSCIRC (ESSCIRC), 2011 Proceedings of the*, sept. 2011, pp. 119 –122.
- [3] M. Mota, J. Christiansen, S. Debieux, V. Ryjov, P. Moreira, and A. Marchioro, "A flexible multi-channel high-resolution time-to-digital converter ASIC," in *Nuclear Science Symposium Conference Record*, 2000 IEEE, vol. 2, 2000, pp. 9/155 –9/159 vol.2.
- [4] R. Forty and M. Charles, "Torch: a novel time-of-flight detector concept," CERN, Geneva, Tech. Rep. LHCb-PUB-2009-030. CERN-LHCb-PUB-2009-030, Nov 2009.
- [5] S. White, M. Chiu, M. Diwan, G. Atoyan, and V. Issakov, "Design of a 10 picosecond Time of Flight Detector using Avalanche Photodiodes," 2009.
- [6] M. Lee and A. Abidi, "A 9b, 1.25 ps resolution coarse-fine time-todigital converter in 90 nm cmos that amplifies a time residue," *Solid-State Circuits, IEEE Journal of*, vol. 43, no. 4, pp. 769 –777, april 2008.
- [7] P. Keranen, K. Maatta, and J. Kostamovaara, "Wide-range time-todigital converter with 1-ps single-shot precision," *Instrumentation and Measurement, IEEE Transactions on*, vol. 60, no. 9, pp. 3162 –3172, sept. 2011.
- [8] L. Perktold and J. Christiansen, "A flexible 5 ps bin-width timing core for next generation high-energy-physics time-to-digital converter applications," *Ph.D. Research in Microelectronics and Electronics (PRIME)*, 2012 8th Conference on, pp. 1–4, june 2012.
- [9] S. Henzler, S. Koeppe, D. Lorenz, W. Kamp, R. Kuenemund, and D. Schmitt-Landsiedel, "Variation tolerant high resolution and low latency time-to-digital converter," in *Solid State Circuits Conference*, 2007. *ESSCIRC 2007. 33rd European*, sept. 2007, pp. 194–197.