MATRIX16: A 16-Channel Low-Power TDC ASIC with 8 ps Time Resolution

Mauricio, Joan; Freixas, Lluís; Sanuy, Andreu; Gómez, Sergio; Manera, Rafel; Marín, Jesús; Pérez, Jose M.; Picatoste, Eduardo; Rato, Pedro; Sánchez, David; Sanmukh, Anand; Vela, Oscar; Gascon, David

doi:10.3390/electronics10151816

Open AccessArticle

MATRIX16: A 16-Channel Low-Power TDC ASIC with 8 ps Time Resolution

by

Joan Mauricio

^1,*

,

Lluís Freixas

²

,

Andreu Sanuy

¹,

Sergio Gómez

³

,

Rafel Manera

¹,

Jesús Marín

²,

Jose M. Pérez

²,

Eduardo Picatoste

¹,

Pedro Rato

²,

David Sánchez

¹

,

Anand Sanmukh

¹,

Oscar Vela

² and

David Gascon

¹

Department Física Quàntica i Astrofísica, Institut de Ciències Del Cosmos (ICCUB), University of Barcelona (UB), 08028 Barcelona, Spain

²

Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas (CIEMAT), 28040 Madrid, Spain

³

Institut d’Estudis Espacials de Catalunya (IEEC), ICCUB (University of Barcelona), 08034 Barcelona, Spain

^*

Author to whom correspondence should be addressed.

Electronics 2021, 10(15), 1816; https://doi.org/10.3390/electronics10151816

Submission received: 8 July 2021 / Revised: 22 July 2021 / Accepted: 26 July 2021 / Published: 29 July 2021

(This article belongs to the Special Issue Advances in Sensor Readout Electronics for Precise Timing)

Download

Browse Figures

Versions Notes

Abstract

:

This paper presents a highly configurable 16-channel TDC ASIC designed in a commercial 180 nm technology with the following features: time-of-flight and time-over-threshold measurements, 8.6 ps LSB, 7.7 ps jitter, 5.6 ps linearity error, up to 5 MHz of sustained input rate per channel, 9.1 mW of power consumption per channel, and an area of 4.57 mm

^{2}

. The main contributions of this work are the novel design of the clock interpolation circuitry based on a resistive interpolation mesh circuit and the capability to operate at different supply voltages and operating frequencies, thus providing a compromise between TDC resolution and power consumption.

Keywords:

TDC; time-to-digital converter; fast timing; PET; VLSI; ASIC; ToF; ToT; low power; frontend electronics

1. Introduction

Time-of-Flight (ToF) measurement is one of the major challenges in high-energy physics experiments [1], medical imaging [2], mass spectrometry [3], and Laser Imaging Detection and Ranging (LiDAR) [4], among others. Precise timing measurements allow computing the distance that a particle traveled and thus identifying tracks, performing coincidence measurements, or determining the distance to objects. On the other hand, Time-over-Threshold (ToT) provides the pulse width information, which has many applications: measuring the deposited energy of the detected particles [5] or applying time-walk corrections [6], among others.

Our research group has been working for years on fast-timing ASIC designs for Positron Emission Tomography (PET) applications [7,8]. HR-FlexToT ASIC provides very good timing performance: 60 ps Single-Photon Time Resolution (SPTR) (using a Hamamatsu S13360-3050 MPPC: 3 × 3

{mm}^{2}

, 50

μ

m^{2}

cell) and low power consumption (<3.5 mW/ch) [8]. The outputs of this chip are Continuous-Time Binary-Valued (CTBV), so that external equipment is required to perform fine timing measurements. The objective of MATRIX16 ASIC is to digitize these outputs with the lowest power consumption possible (<10 mW per channel), to minimize scalability issues when building large PET systems with thousands of channels. Assuming that modern SiPMs offer better than a 100 ps timing resolution [9], TDC resolution should be better than 20 ps to not degrade timing performance substantially.

1.1. TDC Working Principle

A TDC is a device that converts a binary input pulse event into its digital representation. In ToF applications, the internal TDC counters start counting synchronously, and the rising (or falling) edge of the incoming pulse latches the internal counter value (absolute time measurement). In ToF+ToA applications, both rising and falling edges are captured, so that the pulse width can also be computed. In start/stop TDCs, a time interval between two events is measured (relative time measurement).

Time digitization is typically performed by two counter levels: coarse and fine. The coarse counter counts the number of periods of the system clock, and the number of bits of this counter determines the dynamic range of the TDC. The resolution is typically in the ns range. The fine counter stage interpolates the system clock, and therefore, the resolution is scaled down to the picosecond range. This second level is one of the most critical parts of the design, and there are many ways to implement it, depending on the application requirements, technology, cost, scalability, etc.

1.2. State-of-the-Art TDCs

Currently, there are two trends in TDC designs: FPGA based and ASIC based. FPGA TDCs use the fastest delay element (typically the carry logic circuitry) in the device to use it as a Tapped Delay Line (TDL), while ASIC TDCs can be customized for a given purpose.

In [10], the main contribution of the author was a bin realignment method and a dual-sampling method of a TDL implemented on an FPGA (two channels), aiming to reach the limit of Xilinx Ultra-Scale FPGA delay granularity. The achieved resolution was 3.9 ps, and the dead time was only 4 ns. In [11], the authors proposed using the FPGA routing resources (1024 paths) as delay elements instead of using the traditional TDL method, achieving a 7.4 ps time bin, a DNL of 0.74 LSB, an INL of 1.57 LSB, and 0.92 LSB of jitter. The reported power consumption was 23 mW (single channel). Another alternative to reduce the bin size in FPGAs (as well as in ASICs) is to combine the information from multiple TDLs, leading to a stochastic TDC [12,13]. In this technique, the bin size scales down with

\sqrt{N_{T D L}}

, while the power consumption almost scales by

N_{T D L}

. In [12], a TDC bin size of 1.15 ps was achieved (

N_{T D L}

= 20) and a 3.5 ps single-shot precision. Moreover, the author proposed a temperature offset cancellation to compensate bin size variations caused by temperature drifts. Lastly, it is important to remark that from the cited FPGA TDC works, only [11] reported the power consumption, which suggests that this feature is not competitive on FPGAs.

ASIC-based TDC’s most common fine interpolation stage implementations can be divided into three groups:

Flash: This consists of a clock delay line where each stage is sampled by a flip-flop controlled by the input hit edge. Flip-flop outputs are then encoded into a binary counter. The number of stages must be enough to cover, at least, a half period of the reference clock. This implementation is dead time free and suitable for applications with high conversion rates. However, TDC resolution is limited by the minimum delay element, which depends on the CMOS process technology. In [14], subdelay was achieved by interpolating consecutive delay stages with N resistors in between. In [15], an array of adjustable scaled load capacitors was used to achieve subdelay;
Vernier: This aims to improve the TDC resolution beyond the minimum delay element. In this case, two delay lines oscillate at periods $t 1$ and $t 2$ , with an initial phase shift $ϕ_{0}$ corresponding to the fine interpolation delay to be measured [16,17,18]. Thus, the faster delay line catches the slower one after $ϕ_{0} / Δ_{T}$ periods, being the TDC resolution $Δ_{T} = t 2 - t 1$ . The number of logic resources tends to be lower with respect to the Flash implementation, but the dead time ( $= {t_{C l k}}^{2} / Δ_{T}$ ) dramatically increases as $Δ_{T}$ scales down or the dynamic range increases. To expand the range without penalizing the resolution, Reference [19] proposed a taped 2D Vernier ring TDC, achieving a 1 ps timing resolution;
Time Difference Amplification (TDA): The pulse corresponding to the time difference between the input hit edge and the reference clock (≤1 clock period) is amplified by an analog time stretcher, and hence, the resulting pulse can be converted with a lower resolution TDC [20]. The main challenges of this implementation are the linearity and dead time, which constrains the amplification factor.

1.3. TDC Implementation Choice

The choice of the ASIC TDC fine interpolation stage architecture mainly depends on the conversion rate (maximum allowable dead time), resolution (bin size), power consumption, and technology node. The Flash architecture was chosen in this work since the pulse width of the incoming signal was in the few ns range, and each hit edge required a timing measurement (ToT). Even using two independent conversion stages (one for each hit edge), the maximum acceptable dead time (100 ns) would imply oscillating at more than 1 GHz in order to achieve a 10 ps time bin, which is challenging in 180 nm technology.

1.4. Overview

In this work, we present a 16-channel TDC ASIC prototype that provides ToF and ToT measurements. This chip is an evolution of MATRIX4 TDC ASIC [21], a four-channel TDC that provides ToF measurements using a patented technology [22]. The main contribution of this work is the Resistive Interpolation Mesh Circuit (RIMC), an improved Flash TDC architecture that allows improving the TDC resolution beyond the minimum delay element by using a combination of resistive interpolation and stochastic interpolation.

This paper is organized as follows: in Section 2, the building blocks of the chip are described; Section 3 describes the experimental setup; Section 4 shows the chip measurement results; Section 5 compares the ASIC performance of this work with state-of-the-art TDCs; and finally, in Section 6, the conclusions are drawn.

2. MATRIX TDC Design Overview

2.1. Building Blocks

MATRIX16 TDC receives 16 hit signals from a given frontend and converts each input pulse into two short pulses, one per edge. These short pulses latch the internal value of a coarse counter and the state of an array of coupled ring oscillators. The first gives the number of integer clock periods, while the second interpolates the clock phase. The captured data are then encoded, synchronized, buffered, and finally, serially transmitted with an LVDS driver. The block diagram of MATRIX16 is described in Figure 1, and the chip floor plan is shown in Figure 2.

As seen in the floor plan (Figure 2), the TDC core consists of a group of four clusters and the SPI slave interface block, which allows modifying the ASIC configuration via software. Each cluster manages four channels, comprising the following building blocks:

Edge Detector: This converts the edges (either rising or falling) of the input hit into narrow pulses;
Resistive Interpolation Mesh Circuit (RIMC): This is an array of coupled ring oscillators;
PLL: This provides a stable system clock and also generates the serializer output clock;
Time Capture Matrix (TCM): This stores the state of the RIMC at every hit edge;
Coarse Counter: This is a counter running at the system clock frequency. It provides the timestamp at every hit edge;
Backend Readout: This is the the logic resources to encode, buffer, and transmit the acquired events;
Serializer: This is an 8:1 serializer that allows up to 920 Mbps transfer rates.

2.2. Edge Detector

The aim of this block is to convert an input hit into two narrow pulses and thus measure both the rising and falling edges of this input hit. The XNOR operation between the input hit signal and itself with a very short delay (∼300 ps) is performed. The output of this operation will produce a very short (∼300 ps) active low pulse every time an edge occurs at the TIME input. This signal is buffered and then sent to the TCM and Coarse Counter, which will trigger the TDC conversion.

Moreover, this block implements a filter that allows the user to ignore those TIME pulses narrower than a certain pulse width (programmable) and, in this way, avoid very short pulses produced by dark noise and afterpulsing on the SiPMs [23], which may produce readout errors. The decision of whether to discard the event or not is made by the Backend Readout block, and therefore, this circuitry does not add any timing uncertainty to the input hit.

2.3. RIMC

The circuit shown in Figure 3 is a novel clock synthesizer composed of a ring oscillator array coupled by means of resistors, thus providing 56 clock phases of the system clock. These phases are organized into seven rows by eight columns of Delay Elements (DEs). Note that oscillation is achieved by inserting an odd number of rows and connecting the outputs of the last DEs to the inputs of the first DEs. One of the benefits of this architecture is the mesh structure, which partially mitigates any local effect (mismatch) of process variations, since the neighbors will absorb part of the variations of a given node.

The DE (see Figure 3b) contains a current starved inverter, which fixes the row width to 1/14 of the system clock period (from 119 ps in ULP mode, to 78 ps in in HP mode) with the Phase-Locked Loop (PLL) Control Voltage (VCTL), while the resistor introduces a 20 ps subgate delay between adjacent columns (from left to right). The typical end-to-end delay between the first and the last column nodes for a given row is 175 ps since the number of columns is eight (the first column in the left is used as dummy). This delay is fixed, and it only depends on the manufacturing process conditions.

The resistor value, which couples adjacent ring oscillators, is selected in such a way that there are always two DEs switching in consecutive rows (one rising edge and one falling edge) when operating in the typical mode (800 MHz). This avoids any clock duty cycle mismatch between adjacent rows, and it will allow the TDC to obtain time bins smaller than the 20 ps subdelay when combining the phase information of the measurements. Figure 4a,b shows the chronograms for the ULP and HP modes, respectively. It can be seen that the higher the RIMC oscillation frequency is, the higher the row overlapping and the smaller the bin size are. On the contrary, when reducing the RIMC frequency, row overlapping will decrease, and the bin size will increase accordingly. This adjustment can also be used to compensate subgate delay variations produced by changes in the RIMC resistor values, due to wafer-to-wafer and run-to-run variations during the manufacturing process.

2.4. PLL

The system clock is obtained from the RIMC, acting as a Voltage-Controlled Oscillator (VCO) from the PLL point of view. The PLL block (see Figure 5) consists of a Phase-Frequency Detector (PFD), a Charge Pump (CP), and the Clock Manager (CM). The PFD generates charge and discharge pulses proportional to the clock phase shift between the external reference clock (CLK_REF) and an internal feedback clock (CLK_FB) [24]. These charge and discharge signals drive the gates of two transistors acting as current sources. The pll_Icp bit allows modifying the delivered current to the intVctl node, which is connected to an RC circuit acting as a low-pass filter. C1 is a 3 bit switched capacitor, which allows a tuning range from 4 to 32 pF. The intVctl drives an operational amplifier acting as a unity gain buffer. This buffer will drive the VCTL node of the four RIMCs in the ASIC. Finally, the CM allows selecting the operating frequencies for the following clocks: feedback (PLL M factor), Backend Readout, Serializer, and ASIC output, which can operate either in SDR (Single Data Rate) or DDR (Double Data Rate) mode.

2.5. Time Capture Matrix

Both edges of the TIME<15:0> inputs are converted into short pulses by the Edge Detectors. As seen in Figure 6, these pulses latch full custom flip-flops optimized for fast timing (mismatch variability optimization and 50% duty cycle of the input data). The output of these flip-flops will contain the phases of the clock matrix coming from the RIMC. For each row, the eight phases plus the dummy node are sampled (T[Z][Y][8:0]). Then, the column identifier is encoded (COL[Z][Y][2:0]) and the event flag is computed (EVT[Z][Y]). This event flag will indicate the backend for which the row detected a transition, and it has to be taken into account to compute the TDC fine counter value.

2.6. Coarse Counter

This block complements the fine counter and provides a 10 bit counter based in a ripple carry adder. The block layout was implemented in full custom mode to optimize the critical path delays (to ensure reliability when counting at 920 MHz in HP mode) and also to optimize power consumption and area. The counter provides between 1.11 (HP) and 1.71 (ULP) microseconds of dynamic range for both ToF and ToT. An external system, such as a microcontroller or FPGA, can easily extend the ToF dynamic range to an arbitrary value.

2.7. Backend Readout

As seen in Figure 7, this block receives the digital representation of the incoming hits from both fine and coarse counters, then encodes, aligns, and filters (if necessary) the data, stores the events, and finally, sends the data to the Serializer. This block can operate at two frequencies (100 and 200 MHz in typical operating mode) depending on the required throughput.

The fine encoder block receives seven (one per row) encoded columns with the state of the RIMC when the hit occurred. The first nonzero column determines the offset (8 LSB per row) of the fine counter, and then, all the nonzero column values are summed, therefore achieving a counter ranging from 4 to 130 LSB. This combination of several row hits (stochastic TDC) allows computing TDC fine bins much smaller than the nominal subgate delay (20 ps) and allows different operating modes, allowing users to optimize the trade-off between timing resolution and power consumption in each application.

The coarse counter alignment block allows synchronizing both coarse and fine counters (asynchronous). It receives the 10 bit coarse counter measurement and the (LSB_CHANGE) alignment bit. This bit contains the clock phase of the coarse counter when the capture was performed, and it is compared with the fine counter. If a mismatch is detected (fine counter close to full scale and LSB_CHANGE=0), the coarse counter value is decreased by 1 LSB, and hence, the counters are synchronized.

The event builder receives the aligned fine and coarse counters and the Edge Detector TIME_FILTERED (see Section 2.2) signal after being synchronized. Once both the rising and falling edges are captured, the event is ready to be sent, and the data are stored into a 16-event FIFO. One event consists of 5 B: channel identifier, coarse and fine ToF/ToT, and debug bits.

Finally, the event transmitter block converts events into bytes and manages the data transmission protocol: it adds the Start-of-Packet and End-of-Packet bytes before and after the event transmission and the Idle byte when there is no activity. A chronogram example can be seen in Figure 8.

2.8. Serializer

This block performs the 8:1 parallel-to-serial conversion and transmission either in SDR or DDR mode. Serialized bits are driven by a Low-Voltage Differential Signaling (LVDS) driver with an adjustable differential mode current. Data transmission was successful at 920 Mbps in HP mode, even with the minimum differential current (0.35 mA, 0.65 mW power consumption).

3. Methods

This section provides an overview of the experimental setups employed to evaluate the performance of the MATRIX16 ASIC. The control and Data Acquisition (DAQ) system was based on two Printed Circuit Boards (PCBs): The first one was the motherboard, which had an Intel MAX 10 FPGA, a USB interface, voltage regulators, and interface connectors. The second PCB (mezzanine) contained the ASIC socket and the corresponding power regulators, which can be bypassed when an external power supply is used to characterize the ASIC using different supply voltages. Both boards were coupled by means of an LSHM connector, as seen in Figure 9. The FPGA controls the ASIC via SPI and acquires data from the Serializer outputs, then performs the communication with a host PC via the USB protocol.

The test bench to calibrate the ASIC and perform jitter measurements is depicted in Figure 10. A very stable clock is generated by a Pulse Pattern Generator (PPG) (Agilent 81110A), which produces a 100 MHz reference clock (CLK_REF) for the typical operating mode. This frequency varied according to the operating mode under test. MATRIX16 ASIC has an external trigger pin that can be internally redirected to each of the 16 ASIC channels, hence simplifying the measurement setup. This external trigger input can be connected to either Trigger_rnd (from FPGA) or Trigger_syn (from PPG). The power supply (Agilent E3631A) configuration and current consumption measurements, as well as the PPG control were also automated using the GPIB protocol.

The FPGA generates Trigger_rnd to perform calibration measurements, which is uncorrelated with the CLK_REF generated by the PPG. This allowed analyzing the statistical behavior of static Process, Voltage, and Temperature (PVT) variations, thus obtaining calibration tables and computing linearity for each ASIC. For the jitter measurements, both CLK_REF and Trigger_syn are provided by the PPG. This generator produces a pulse in phase with the clock, which can be electronically controlled via GPIB.

3.1. Linearity Test

The purpose of this test was twofold: On the one hand, this was performed to characterize the effects of static variability (temperature, IR drops, process, and mismatch) either within-die and die-to-die [25]. On the other hand, the test would provide calibration data, which would help to reduce the linearity error of the TDC and therefore improve the timing resolution.

Calibration was performed by means of a code density test [26]. This test consisted of producing a very large number (200 k in this case) of random pulse hits following a uniform distribution at the TDC input channels. Such a number of repetitions would reduce statistical fluctuations due to dynamic effects such as jitter. The binary code corresponding to wider TDC bins (slower stages) would occur more often than the narrower ones due to the uniform distribution of the incoming hits. TDC bin sizes can be obtained by normalizing the number of hits of each TDC bin to the RIMC oscillation period (see Equation (1)), since the sum of the hits for all codes is equivalent to the total number of hits.

Finally, we can obtain the Differential Nonlinearity (DNL) and the Integral Nonlinearity (INL) of each TDC channel (see Equations (2) and (3)), which would show the statistical impact of the mismatch on our TDC.

\begin{matrix} W i d t h_{B i n} (p s) & = R I M C_{P e r i o d} \cdot \frac{N H i t s_{B i n}}{N H i t s_{T o t a l}} \end{matrix}

(1)

\begin{matrix} D N L [i] & = W i d t h_{B i n} [i] - W i d t h_{N o m i n a l} \end{matrix}

(2)

\begin{matrix} I N L [i] & = \sum_{k = 0}^{k = i} D N L [k] \end{matrix}

(3)

3.2. Jitter

Jitter was measured by injecting N synchronous pulse shots (20 k in the current test) with the PPG and measuring the standard deviation of the ToF measurement. This procedure was repeated in 5 ps steps within the full dynamic range of the fine counter. The objective of the sweep was to obtain a more representative sampling of the jitter within the whole fine counter transfer function than a single measurement in an arbitrary phase. The jitter of a given channel was obtained by computing the quadratic mean of the jitter measurements.

4. Experimental Results

This section shows the linearity, jitter, and power consumption measurements of the MATRIX16 chip prototype in different operating modes. The voltage and frequency settings under each operating mode (profile) are detailed in Table 1. HP mode pursues the maximum chip performance (timing resolution), while ULP mode focuses on optimizing power consumption. The intermediate modes (TYP and LP) try to reach a trade-off between power and performance.

The number of chip samples for the characterization was 15, leading to a population of 240 channels. The typical values shown in the plot legends in Figure 11, Figure 12, Figure 13 and Figure 14 (

σ_{T y p}

) correspond to the quadratic mean of the 240 channels for each measurement.

4.1. Linearity Test

Figure 11 shows the DNL standard deviation distribution for the 240 channels, showing that the DNL standard deviation was typically around 2/3 of its corresponding fine counter LSB in all the operation modes. Figure 12 shows the maximum INL, which was typically around 2 LSB (3 LSB in the worst case). Figure 13 shows the maximum bin width that we could obtain from each TDC channel. This measurement allowed determining what the single-shot precision was in the worst scenario: around 3.5 LSB.

It is important to mention that once the corrections obtained from calibration data were applied, the DNL’s standard deviation was reduced to 2–4 ps, and the maximum INL became negligible.

4.2. Jitter

Figure 14 shows the ToF jitter standard deviation distribution for the 240 channels. It can be seen that for the same power supply voltage (TYP and HP modes), jitter linearly increased with the RIMC period, while it dramatically increased as the RIMC power supply scaled down (LP and ULP modes).

4.3. Power Consumption

Table 2 shows the typical power consumption for each ASIC power domain and performance profile, at a 100 kHz conversion rate. Most of the power consumption did not depend significantly on the ASIC conversion rate since the RIMC, which was always on, took around 60 to 70% of the power budget. Other circuits, such as the LVDS Serializer output lines, were also always on. The total ASIC power consumption was 46.5 mW (2.9 mW/ch) in ULP, 80.4 mW (5.0 mW/ch) in LP, 131 mW (8.2 mW/ch) in TYP, and 146 mW (9.1 mW/ch) in HP mode.

5. Discussion

As seen in the state-of-the-art, FPGA and ASIC TDCs can offer similar performance in terms of TDC bin size and resolution. The advantages of FPGAs are a faster development time, lesser prototyping cost, and higher flexibility. However, the power consumption requirements (<10 mW/ch) are too stringent for FPGA TDC designs, where silicon is not optimized for such purposes. The unit price and chip area are also key limiting factors for building large PET systems with thousands of TDC channels and high channel density requirements. Moreover, TDCs can be integrated into the same substrate where the sensor frontend readout circuitry is implemented, leading to very compact System-on-Chip (SoC) solutions [27,28] or digital SiPMs [29].

There are many multilevel TDC ASIC implementations in the literature. Each implementation type aims to optimize a given specification, and this makes it difficult to draw a fair comparison of the proposals. For this reason, Table 3 restricts the performance comparison to recent flash TDC implementations and our work. The Figure-of-Merit (FoM), defined in Table 3c, allows benchmarking the different proposals, where the minimum FoM corresponds to the TDC with the best combination between timing resolution and power consumption.

MATRIX16 not only increases channel density with respect to MATRIX4 [21], but also integrates new functionalities. The most important one is the ToT measurement, which increases the number of target applications for this ASIC. Event filtering by the pulse width reduces the number of dark count pulses to be processed by the readout system, and 4:1 multiplexed data links allow transmitting ToF+ToT information from 16 channels with the same number of Serializer links as MATRIX4. Moreover, the Backend Readout data encoding improvements slightly improved the timing resolution (from 10.1 to 9.5 ps, without calibration), and the low-power digital design techniques reduced power consumption (from 11.3 to 9.1 mW per channel). It is important to highlight that the ASIC presented in this work achieved a <10 ps timing resolution with less than 10 mW of power consumption per channel, which was one of the major constraints in the choice of the TDC architecture.

The most similar work to our proposal is PicoTDC [30], the flash TDC ASIC with the best FoM (69 fJ/conv) and timing resolution (3.4 ps without calibration), where resistive interpolation was also used to achieve subdelay elements. The main differences between PicoTDC and our proposal were the resistive mesh topology and the technology node: 65 nm in PicoTDC vs. 180 nm in MATRIX16. Even with this gap between the technologies, which penalizes the minimum TDC bin size and power consumption, the achieved FoM in this work (HP mode, 86 fJ/conv) was close to the PicoTDC proposal, which clearly has room for improvement if implemented in a more advanced technology node.

The work in [15] is also remarkable, where 9.8 ps of resolution was achieved with 12 mW/ch (FoM of 119 fJ/conv). This is especially difficult in a 350 nm technology, with slower and more power-demanding transistors.

Table 3. MATRIX16 performance summary and comparison with recent proposals.

^{a}

After calibration.

^{b}

Core area (not full chip size).

^{c}

Figure-of-Merit

= R e s o l u t i o n \cdot P o w e r

.

Table 3. MATRIX16 performance summary and comparison with recent proposals.

^{a}

After calibration.

^{b}

Core area (not full chip size).

^{c}

Figure-of-Merit

= R e s o l u t i o n \cdot P o w e r

.

	MATRIX16 (HP)	MATRIX16 (LP)	[15]	MATRIX4 [21]	[31]	[30]	[32]	[33]
Process (nm)	180	180	350	180	130	65	65	45
ToF+ToT	Yes	Yes	No	No	Yes	Yes	Yes	No
LSB (ps)	8.6	12.4	8.9	9.3	125	3	102	25
Resolution (ps)	9.5 (8.0 $^{a}$ )	20.9 (19.4 $^{a}$ )	9.8	10.1	65.3	3.4 (1.3 $^{a}$ )	95	-
Channels	16	16	7	4	18	64	8	1
Power (mW/ch)	9.1	5.0	12.1	11.3	3.4	20.3	28.8	16
FoM $^{c}$ (fJ/conv)	86 (73 $^{a}$ )	102 (95 $^{a}$ )	119	114	222	69 (26 $^{a}$ )	2660	-
Area (mm $^{2}$ )	4.57	4.57	8.88	4.2	3.72	-	0.3 $^{b}$	0.36 $^{b}$

6. Conclusions

A 16-channel TDC ASIC was designed, implemented, and tested. One of the key features of this chip is that achieved an 8 ps time resolution (after calibration) with 9 mW/ch and a peak conversion rate of 50 MHz, making it suitable for ultra-fast timing applications with a moderate power consumption. The RIMC overlapping flexibility allows working under different modes, thus optimizing the trade-off between power consumption and timing resolution. In fact, assuming an excellent SPTR of a given state-of-the-art frontend (e.g., 60 ps sigma [8]), the impact on timing degradation produced by MATRIX16 in LP mode (20 ps) was very small (3.5 ps), while the power consumption was almost reduced by 50% (4.9 mW/ch).

The major obstacle that prevents using this ASIC in ULP mode (3 mW/ch) and beyond is related to the RIMC clock jitter, which was dramatically degraded as the RIMC power supply (the largest power consumption contribution) was scaled down. Further research should be addressed to keep an acceptable clock jitter (<20 ps) even when power supply is lowered to 1.2 V (nominal

V_{D D}

is 1.8 V).

This ASIC was designed on purpose to be the backend readout chip of the HR-FlexToT ASIC. Thus, both chips can be easily integrated into a system-in-package, which opens the door to building large PET systems with a high channel density while maintaining a low power consumption. Moreover, channel data output multiplexation reduced the number of serializers by four.

Author Contributions

Conceptualization, J.M. (Joan Mauricio), S.G., A.S. (Andreu Sanuy), and D.G.; software, J.M. (Joan Mauricio); hardware, J.M. (Joan Mauricio), L.F., and A.S. (Andreu Sanuy); validation, all authors; investigation, J.M. (Joan Mauricio) and D.G.; writing—original draft preparation, J.M. (Joan Mauricio); writing—review and editing, J.M. (Joan Mauricio), L.F., A.S. (Andreu Sanuy), S.G., R.M., J.M. (Jesús Marín), J.M.P., E.P., P.R., D.S., A.S. (Anand Sanmukh), O.V., and D.G.; supervision, J.M. (Joan Mauricio), A.S. (Andreu Sanuy), S.G., and D.G.; project administration, J.M. (Joan Mauricio), J.M. (Jesús Marín), and D.G.; funding acquisition, J.M. (Jesús Marín) and D.G. All authors read and agreed to the published version of the manuscript.

Funding

This research was funded by the Spanish Ministerio de Economía y Competitividad (MINECO), Grant TEC2015-66002-R (MINECO/FEDER). We also acknowledge financial support from the State Agency for Research of the Spanish Ministry of Science and Innovation through the “Unit of Excellence María de Maeztu 2020-2023” award to the Institute of Cosmos Sciences (CEX2019-000918-M).

Conflicts of Interest

The authors declare no conflict of interest.

References

Harnew, N.; Bhasin, S.; Blake, T.; Brook, N.; Conneely, T.; Cussans, D.; van Dijk, M.; Forty, R.; Frei, C.; Gabriel, E.; et al. Status of the TORCH time-of-flight project. Nucl. Instrum. Methods Phys. Res. 2020, 952, 161692. [Google Scholar] [CrossRef]
Vaquero, J.J.; Kinahan, P. Positron Emission Tomography: Current Challenges and Opportunities for Technological Advances in Clinical and Preclinical Imaging Systems. Ann. Rev. Biomed. Eng. 2015, 17, 385–414. [Google Scholar] [CrossRef] [Green Version]
Pareige, C.; Lefebvre-Ulrikson, W.; Vurpillot, F.; Sauvage, X. Chapter Five—Time-of-Flight Mass Spectrometry and Composition Measurements. In Atom Probe Tomography; Lefebvre-Ulrikson, W., Vurpillot, F., Sauvage, X., Eds.; Academic Press: Cambridge, MA, USA, 2016; pp. 123–154. [Google Scholar] [CrossRef]
Padmanabhan, P.; Zhang, C.; Charbon, E. Modeling and Analysis of a Direct Time-of-Flight Sensor Architecture for LiDAR Applications. Sensors 2019, 19, 5464. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Sharma, S. Time Over Threshold as a measure of energy response of plastic scintillators used in the J-PET detector. Epj Web Conf. 2019, 199, 05014. [Google Scholar] [CrossRef] [Green Version]
Du, J.; Schmall, J.; Judenhofer, M.; Di, K.; Yang, Y.; Cherry, S. A Time-Walk Correction Method for PET Detectors Based on Leading Edge Discriminators. IEEE Trans. Radiat. Plasma Med. Sci. 2017, 1, 385–390. [Google Scholar] [CrossRef] [PubMed]
Comerma, A.; Gascon, D.; Freixas, L.; Garrido, L.; Graciani, R.; Marin, J.; Martinez, G.; Perez, J.M.; Mendes, P.R.; Castilla, J.; et al. FlexToT-Current Mode ASIC for Readout of Common Cathode SiPM Arrays; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2013. [Google Scholar] [CrossRef]
Sanchez, D.; Gomez, S.; Mauricio, J.; Freixas, L.; Sanuy, A.; Guixe, G.; Lopez, A.; Manera, R.; Marın, J.; Perez, J.M.; et al. HRFlexToT: A High Dynamic Range ASIC for Time-of-Flight Positron Emission Tomography. IEEE Trans. Radiat. Plasma Med. Sci. 2021. [Google Scholar] [CrossRef]
Gundacker, S.; Turtos, R.M.; Kratochwil, N.; Pots, R.H.; Paganoni, M.; Lecoq, P.; Auffray, E. Experimental time resolution limits of modern SiPMs and TOF-PET detectors exploring different scintillators and Cherenkov emission. Phys. Med. Biol. 2020, 65, 25001. [Google Scholar] [CrossRef]
Wang, Y.; Liu, C. A 3.9 ps Time-Interval RMS Precision Time-to-Digital Converter Using a Dual-Sampling Method in an UltraScale FPGA. IEEE Trans. Nucl. Sci. 2016, 63, 2617–2621. [Google Scholar] [CrossRef]
Zhang, M.; Wang, H.; Liu, Y. A 7.4 ps FPGA-Based TDC with a 1024-Unit Measurement Matrix. Sensors 2017, 17, 865. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Qin, X.; Wang, L.; Liu, D.; Zhao, Y.; Rong, X.; Du, J. A 1.15 ps Bin Size and 3.5 ps Single-Shot Preci-sion Time-to-Digital-Converter with On-Board Offset Correction in an FPGA. IEEE Trans. Nucl. Sci. 2017, 64, 2951–2957. [Google Scholar] [CrossRef]
Tang, Y.; Townsend, T.; Deng, H.; Liu, Y.; Zhang, R.; Chen, J. A Highly-Linear FPGA-Based TDC and a Low-Power Multi-Channel Readout ASIC with a Shared SAR ADC for SiPM Detectors. IEEE Trans. Nucl. Sci. 2021, 1. [Google Scholar] [CrossRef]
Perktold, L.; Christiansen, J. A multichannel time-to-digital converter ASIC with better than 3ps RMS time resolution. J. Instrum. 2014, 9, C01060. [Google Scholar] [CrossRef]
Jansson, J.P.; Mäntyniemi, A.; Kostamovaara, J. A multi-channel wide range time-to-digital converter with better than 9ps RMS precision for pulsed time-of-flight laser rangefinding. In Proceedings of the 2012 ESSCIRC (ESSCIRC), Bordeaux, France, 17–21 September 2012; pp. 273–276. [Google Scholar] [CrossRef]
Dudek, P.; Szczepanski, S.; Hatfield, J. A high-resolution CMOS time-to-digital converter utilizing a Vernier delay line. IEEE J. Solid-State Circuits 2000, 35, 240–247. [Google Scholar] [CrossRef]
Nguyen, V.N.; Duong, D.N.; Chung, Y.; Lee, J.W. A Cyclic Vernier Two-Step TDC for High Input Range Time-of-Flight Sensor Using Startup Time Correction Technique. Sensors 2018, 18, 3948. [Google Scholar] [CrossRef] [Green Version]
Cheng, Z.; Deen, M.J.; Peng, H. A Low-Power Gateable Vernier Ring Oscillator Time-to-Digital Converter for Biomedical Imaging Applications. IEEE Trans. Biomed. Circuits Syst. 2016, 10, 445–454. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Dai, F. A 14-Bit, 1-ps resolution, two-step ring and 2D Vernier TDC in 130nm CMOS technology. In Proceedings of the ESSCIRC 2017-43rd IEEE European Solid State Circuits Conference, Leuven, Belgium, 11–14 September 2017; pp. 143–146. [Google Scholar] [CrossRef]
Abas, A.; Bystrov, A.; Kinniment, D.; Maevsky, O.; Russell, G.; Yakovlev, A. Time difference amplifier. Electron. Lett. 2002, 38, 1437–1438. [Google Scholar] [CrossRef] [Green Version]
Mauricio, J.; Gascón, D.; Ciaglia, D.; Gómez, S.; Fernández, G.; Sanuy, A. MATRIX: A 15 ps resistive interpolation TDC ASIC based on a novel regular structure. J. Instrum. 2016, 11, C12047. [Google Scholar] [CrossRef] [Green Version]
Mauricio, J.; Gascon, D. Resistive Interpolation Mesh Circuit for Time-to-Digital Converters. Patent WO/2017/134023, 10 August 2017. [Google Scholar]
Gundacker, S.; Heering, A. The silicon photomultiplier: Fundamentals and applications of a modern solid-state photon detector. Phys. Med. Biol. 2020, 65, 17TR01. [Google Scholar] [CrossRef] [PubMed]
Fischette, D. Practical Phase Practical Phase Locked Loop Locked Loop Design Design. 2004. Available online: https://www.eecis.udel.edu/~vsaxena/courses/ece518/Handouts/PLLTutorialISSCC2004.pdf (accessed on 8 June 2021).
Bernstein, K.; Frank, D.J.; Gattiker, A.E.; Haensch, W.; Ji, B.L.; Nassif, S.R.; Nowak, E.J.; Pearson, D.J.; Rohrer, N.J. High-performance CMOS variability in the 65-nm regime and beyond. Ibm J. Res. Dev. 2006, 50, 433–449. [Google Scholar] [CrossRef]
Swann, B.; Blalock, B.; Clonts, L.; Binkley, D.; Rochelle, J.; Breeding, E.; Baldwin, K. A 100-ps time-resolution CMOS time-to-digital converter for positron emission tomography imaging applications. IEEE J. Solid-State Circuits 2004, 39, 1839–1852. [Google Scholar] [CrossRef]
Sacco, I.; Fischer, P.; Ritzert, M. PETA4: A multi-channel TDC/ADC ASIC for SiPM readout. J. Instrum. 2013, 8, 23–27. [Google Scholar] [CrossRef]
Muntean, A.; Venialgo, E.; Ardelean, A.; Sachdeva, A.; Ripiccini, E.; Palubiak, D.; Jackson, C.; Charbon, E. Blumino: The first fully integrated analog SiPM with on-chip time conversion. IEEE Trans. Radiat. Plasma Med. Sci. 2020, 1. [Google Scholar] [CrossRef]
Roy, N.; Nolet, F.; Dubois, F.; Mercier, M.O.; Fontaine, R.; Pratte, J.F. Low Power and Small Area, 6.9 ps RMS Time-to-Digital Converter for 3-D Digital SiPM. IEEE Trans. Radiat. Plasma Med. Sci. 2017, 1, 486–494. [Google Scholar] [CrossRef]
Horstmann, M.; Christiansen, J.; Altruda, S.; Lumer-Klabbers, G.; Jeffrey, P. picoTDC: Pico-Second TDC for HEP. Available online: https://indico.cern.ch/event/755407/contributions/3130541/attachments/1738732/2827254/picoTDCPresentation.pdf (accessed on 8 June 2021).
Chithra.; Krishnapura, N. A Flexible 18-Channel Multi-Hit Time-to-Digital Converter for Trigger-Based Data Acquisition Systems. IEEE Trans. Circuits Syst. Regul. Pap. 2020, 67, 1892–1901. [Google Scholar] [CrossRef]
Marino, N.; Baronti, F.; Fanucci, L.; Saponara, S.; Roncella, R.; Bisogni, M.G.; Del Guerra, A. A Multichannel and Compact Time to Digital Converter for Time of Flight Positron Emission Tomography. IEEE Trans. Nucl. Sci. 2015, 62, 814–823. [Google Scholar] [CrossRef]
Ur Rehman, S.; Khafaji, M.M.; Carta, C.; Ellinger, F. A 16 mW 250 ps Double-Hit-Resolution Input-Sampled Time-to-Digital Converter in 45-nm CMOS. IEEE Trans. Circuits Syst. II Express Briefs 2018, 65, 562–566. [Google Scholar] [CrossRef]

Figure 1. Block diagram of MATRIX16.

Figure 2. Detailed floor plan of the ASIC, including all the building blocks of one cluster of 4 channels.

Figure 3. (a) RIMC schematic. (b) DE schematic. (c) Starved inverter schematic.

Figure 4. Chronogram of the RIMC nodes (sorted by rows) in ULP and HP modes. The phase delay between columns within the same row is static and dynamic between rows (depending on the oscillation frequency).

Figure 5. Simplified PLL schematic. Top-left: Phase-Frequency Detector (PFD). Top-right: Charge Pump (CP). Bottom: Clock Manager.

Figure 6. Schematic of the Time Capture Matrix, channel Z, row Y.

Figure 7. Backend Readout block diagram corresponding to 1 cluster (4 channels).

Figure 8. Event Transmitter chronogram example (1 event sent). Start-of-Packet = 0 × 5C, End-of-Packet = 0 × FA, Idle byte = 0 × CA.

Figure 9. MATRIX16 test PCBs. (a) Motherboard hosting the MAX 10 FPGA, power regulators, and connectors. (b) Mezzanine board with the QFN64 footprint and socket. (c) Boards coupled via LHSM connectors.

Figure 10. Schematic representation of the experimental setup used to calibrate and evaluate the jitter of the ASICs.

Figure 11. DNL standard deviation for each MATRIX16 channel sample (without calibration).

Figure 12. Maximum INL for each MATRIX16 channel sample (without calibration).

Figure 13. Maximum bin width for each MATRIX16 channel sample (without calibration).

Figure 14. ToF jitter standard deviation for each MATRIX16 channel sample (after bin calibration).

Table 1. Supply voltage and oscillation frequency settings for each profile. VDD_FB corresponds to the Frontend and Backend Readout blocks’ power supply, while VDD_X corresponds to the oscillator power supply.

Profile	VDD_FB	VDD_X	RIMC Freq	Fine LSB (Typ)
ULP	1.6 V	1.2 V	600 MHz	13.2 ps
LP	1.8 V	1.5 V	640 MHz	12.4 ps
TYP	1.8 V	1.8 V	800 MHz	9.9 ps
HP	1.8 V	1.8 V	920 MHz	8.6 ps

Table 2. Power consumption for each profile.

P_{V D D_F}

supplies the Frontend Readout blocks: edge detectors, TCMs, and coarse counters.

P_{V D D_B}

supplies the Backend Readout blocks, Serializers, and SPI.

P_{V D D_X}

supplies the PLLs and RIMCs.

Table 2. Power consumption for each profile.

P_{V D D_F}

supplies the Frontend Readout blocks: edge detectors, TCMs, and coarse counters.

P_{V D D_B}

supplies the Backend Readout blocks, Serializers, and SPI.

P_{V D D_X}

supplies the PLLs and RIMCs.

Profile	$P_{VDD_F}$ (mW)	$P_{VDD_B}$ (mW)	$P_{VDD_X}$ (mW)	$P_{Total}$ (mW)	$P_{Total}$ (mW/ch)
ULP	12.3	9.9	26.3	46.5	2.9
LP	17.6	12.8	50.0	80.4	5.0
TYP	20.4	15.4	94.7	130.5	8.2
HP	22.2	17.5	106.4	146.0	9.1

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mauricio, J.; Freixas, L.; Sanuy, A.; Gómez, S.; Manera, R.; Marín, J.; Pérez, J.M.; Picatoste, E.; Rato, P.; Sánchez, D.; et al. MATRIX16: A 16-Channel Low-Power TDC ASIC with 8 ps Time Resolution. Electronics 2021, 10, 1816. https://doi.org/10.3390/electronics10151816

AMA Style

Mauricio J, Freixas L, Sanuy A, Gómez S, Manera R, Marín J, Pérez JM, Picatoste E, Rato P, Sánchez D, et al. MATRIX16: A 16-Channel Low-Power TDC ASIC with 8 ps Time Resolution. Electronics. 2021; 10(15):1816. https://doi.org/10.3390/electronics10151816

Chicago/Turabian Style

Mauricio, Joan, Lluís Freixas, Andreu Sanuy, Sergio Gómez, Rafel Manera, Jesús Marín, Jose M. Pérez, Eduardo Picatoste, Pedro Rato, David Sánchez, and et al. 2021. "MATRIX16: A 16-Channel Low-Power TDC ASIC with 8 ps Time Resolution" Electronics 10, no. 15: 1816. https://doi.org/10.3390/electronics10151816

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MATRIX16: A 16-Channel Low-Power TDC ASIC with 8 ps Time Resolution

Abstract

1. Introduction

1.1. TDC Working Principle

1.2. State-of-the-Art TDCs

1.3. TDC Implementation Choice

1.4. Overview

2. MATRIX TDC Design Overview

2.1. Building Blocks

2.2. Edge Detector

2.3. RIMC

2.4. PLL

2.5. Time Capture Matrix

2.6. Coarse Counter

2.7. Backend Readout

2.8. Serializer

3. Methods

3.1. Linearity Test

3.2. Jitter

4. Experimental Results

4.1. Linearity Test

4.2. Jitter

4.3. Power Consumption

5. Discussion

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI