Ultra Miniature 1850 μm
 2 Ring Oscillator Based Temperature Sensor

Temperature sensing is a necessity in semiconductor products, in order to monitor die behavior and avoid thermal runaway, while achieving high performance. Integrated sensors are used to monitor and regulate numerous hot spots across the die to prevent reliability issues. As the hot spots are in the most congested areas of the chip, it is also desirable for the sensors to have a very small sensing element which can be placed close to the hot-spot. The sensors are also used to monitor the coldest parts of the chip to determine the required Vdd level. These functions require the sensors to be very compact as well as low energy. A ring oscillator based temperature sensor is presented in TSMCs 65nm node, with an area of <inline-formula> <tex-math notation="LaTeX">$1850\mu m^{2}$ </tex-math></inline-formula>. This sensor has a novel structure which is similar to a bandgap reference, with the BJT devices replaced by scaled ring oscillators. The sensor exhibits a 3-sigma inaccuracy of ±1°C near the throttle point, for hot-spot sensing, and ± 2.5°C over the −10°C to 110°C range. The power supply rejection is 2.4°C/V. The sensor consumes 0.94nJ per <inline-formula> <tex-math notation="LaTeX">$10\mu \text{s}$ </tex-math></inline-formula> conversion and achieves a resolution FOM of 96pJ-<inline-formula> <tex-math notation="LaTeX">$K^{2}$ </tex-math></inline-formula>.


I. INTRODUCTION
Thermal sensors are used to measure and regulate the temperature in nearly every computer system and integrated circuit (IC). The hot-spots in CPUs are identified by multiple thermal sensors spread across the die [1], [2]. When one of these hot-spots approaches the reliability limit the sensor indicates a warning to the Power Management Unit (PMU) or the Package Control Unit (PCU) of the chip, which causes the IC to reduce its frequency, a command referred to as throttling [3]- [5]. If the chip continues to heat up, there is an additional catastrophic temperature indicator, usually 15-20 • C above the throttle point at which the platform shuts down [3]. At low operating voltages the IC frequency can observe an inverse temperature dependence, which causes the operating frequency to be lowered as the temperature drops [2]. As such, the sensors are also used to determine the coldest parts of the CPU, in order to determine the required Vdd level to maintain frequency and avoid under spec performance [5]. The sensors are also used to determine fan regulation of the entire system [6]. The accuracy of the sensor is thus linked to the power/performance of the chip.
The associate editor coordinating the review of this manuscript and approving it for publication was Yong Chen .
In addition, as multiple hot-spots and cold-spots need to be measured, there can be many sensors spread across the die (as many as 40) [2]. It is thus highly desirable that the sensors be compact (< 0.02 mm 2 ) [1] and low energy. The thermal time constant of CPU's is ∼ 1-10ms [2], so the sensors should have a sensing speed > 1kS/sec. If the sensors are faster, then they can be duty-cycled to save further power. During deep-sleep states, the sensors can be turned off to save power. Upon reawakening, a fast reading (10-20µs) is required to determine the required supply level [2]. This initial reading is not required to be highly accurate, since some guard band can be added to the initial Vdd level to compensate. Vdd can be lowered subsequently without interfering with operation, once the sensor provides a better reading or average of several readings. The specifications of compact sensors in a CPU is ±3 • C at the throttle point and ±5 • C across the rest of the range. Since part of the inaccuracy is associated with the testing and calibration, it is recommended that the Si accuracy be ±1 • C at throttle and ±3 • C over the range [2]. There are several mechanisms available to sense temperature [7], the most established in products being the parasitic PNP bipolar junction transistor (BJT) found in the CMOS process. The PNP sensor, and its variants are based upon summing proportional to absolute temperature (PTAT) VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and complementary to absolute temperature (CTAT) nodes in a bandgap reference circuit to create a reference voltage, Vref [8]. The CTAT or PTAT nodes are then compared to Vref using an analog-to-digital converter. Resistors are also a good alternative to PNP-based sensors, although the sensing elements tend to be larger [9], [10]. MOS based sensors, which utilize the temperature dependence of the threshold voltage, Vth, have also been reported and these can be both small and very fast [11]- [14]. Another type of sensor is Thermal Diffusivity (TD) which can achieve high accuracy, without calibration [15]. In this paper, we present a compact 1850µm 2 ring-oscillator based sensor, which exhibits a 0.94 nJ/conv. energy and a conversion speed of 10µs.

II. RING OSCILLATOR BASED SENSOR CIRCUIT DESIGN
A conventional bandgap reference (BGREF) current generator is shown in Fig. 1a. Since the two BJT's are sized differently, the Vbe voltage will appear across the resistor [8] and the current will obey (1) where KT/q is the thermal constant, R is the resistance and N is the BJT ratio. In Fig. 1b, the BJT's are replaced with diode connected NMOS devices in subthreshold, and the Vgs voltage will appear across the resistor, and the current will obey (2) If the transistors are in strong inversion, the circuit in Fig. 1b will obey (3).
The MOS reference circuit can also have the smaller transistor operating in strong or moderate inversion, while the larger device operates in sub-threshold, although the equations in this case become more complex.
In a conventional bandgap-based thermal sensor, an accurate Analog-to-Digital-Converter (ADC) is required to digitize the PTAT or CTAT voltages [1] [2]. In addition, the parasitic PNP used in BJT-based sensors are very large devices in modern CMOS processes. The advantage of a ring-oscillator sensor is that the temperature dependent frequency just needs to be input into a counter to yield a digital code, which is output after a known period, thus avoiding the need for an ADC. Ring oscillators or other delay-based sensors, such as [11], can allow very fast conversion times at relatively high resolution and low power. They can also be designed to be relatively compact. However, ring-oscillators are highly sensitive to the supply voltage as well which can make power supply rejection a difficult problem to solve using digital means alone. In this design, it is proposed to replace the conventional BJTs or subthreshold MOS with ring oscillators (RO), as shown in Fig. 2. This topology combines the compact size and high resolution aspect of ring oscillator sensors with the analog regulation of bandgap-based sensors. RO1 and RO2 are similar oscillators, but RO2 was instantiated 8 times in parallel by shorting the input and output nodes of 8 identical oscillators. The feedback loop regulates the supply voltages of the ring oscillators, Vdd_fast and Vdd_slow in a similar manner as the bandgap references in Fig. 1 regulated Vbe or Vgs of the BJTs or MOSs respectively. The current in each of oscillators can be expressed as (4): or as (5): where F is the frequency of operation, C is the total capacitance in the oscillator and V is the oscillation amplitude. The amplifier forces the nodes Vdd_fast and Vres to an equal voltage, thereby yielding (6): (6) which results in the relationship (7) for Vdd_slow.
During steady-state operation, the current is relatively low (in the µA range), so the oscillators toggle close to Vth. As the temperature rises, the amplitude is lowered, due to the CTAT nature of Vth, which increases the operating frequency. In addition, the current exhibits a PTAT nature, similar to its BJT counterpart (Fig. 1a), which further increases the operating frequency. Both effects are close to linear, resulting in a nearly linear frequency rise with temperature. The currents in the two oscillators are equal and provided by the two current sources in Fig. 2 which are controlled by the feedback loop. Since the capacitance of RO2 is 8× larger, its operating frequency is much slower than RO1. However, it also has a slightly lower amplitude of operation, which slightly increases the frequency, so the ratio between frequencies is approximately 5.5×. The sensor in Fig.2 behaves very similarly to the MOS based BGREF of Fig. 1b. The Vdd_fast and Vdd_slow nodes are close to the Vth voltage. The difference in the oscillators Vdd voltages appears across the resistor R1 as in (7). However, because of the toggling of the oscillators, some switching noise appears at these nodes. In order to mitigate these effects, decoupling capacitors were placed at nodes Vdd_fast (100fF to Vss),Vdd_slow (50fF to Vss) and PG (200fF to Vdd). Fig. 3 shows the top-level schematic of the sensor. This includes the sensing core with the two oscillators, as well the feedback loop regulating them. There is a start-up circuit which ensures that the circuit reaches the correct operating point. Since the oscillators operate near Vth, level shifters are used to raise f1 and f2 to CMOS levels. Additional biases are required to enable the level shifting operation.
Conventional BGREF circuits, similar to Fig. 1, are usually started up by injecting a current into the circuit. This current, which may be dependent upon PVT effects, can place the circuit into its correct operating state. Usually the startup circuit VOLUME 8, 2020 consists of a feedback loop, which allows it to turn off automatically after powerup. BJT's currents obey an exponential dependence on Vbe; thus, even if very high start-up currents are injected initially, the BJT's will clamp the Vbe voltage, and the circuit will settle to its correct operating point. However, the oscillators shown in Fig. 2 have a linear voltage dependence on current, as expressed in (4). Thus, the voltage drop across the oscillator is essentially unconstrained, and must be carefully controlled during startup. If the start-up currents are too high, it could lead to the node PG being clamped to Vss and the RO supplies reaching Vdd. In most conventional BGREFs, there are two stable states: the correct operating state, and a zero-current state. The RO based circuit has an additional stable state whereby there is too much current and regulation is lost. This is a dangerous unknown, making the startup problem more complex than conventional BGREFs.
The startup circuit is shown in Fig. 3. The startup mechanism is based upon the assertion of the node PG to the supply voltage, and the oscillator nodes to the ground using dedicated MOS switches. This closes the current sources, and ensures the oscillators are not running, thus avoiding the situation where the circuit starts at very high currents. The startup consists of a low current bias created by two stacked diode-connected transistors (M4 and M5A), which provide a stable PTAT branch of current. Once this branch is turned on, at the start of the operation, the current is copied (with a multiplying factor) to M5B, pulling the gate of M3 high. This leads to the drop of node PG, opening the current sources. The current sources allow the charging of internal nodes, and place the amplifier in its working region, near the equilibrium governed by the exact temperature and supply voltage of the system. Likewise, the oscillators begin their oscillation, stabilizing at their native frequency, governed by the number of oscillators, current in the branch, and process variation. Once current is flowing in the system, it is mirrored to M2C, where it is copied to M1B through M1C, leading to the lowering of the gate of M3, thereby closing the startup circuit.
Similar to the BGREF circuits described in Fig. 1, an amplifier is in the heart of the system, holding the nodes Vdd_fast and Vres at equilibrium. In order to allow high PSR (power supply rejection) a relatively high DC gain is required. As such the amplifier was chosen to be a two stage Miller amplifier as shown in Fig. 4 with a nominal gain of 75dB. Since the amplifier is a critical block in the system, it was chosen to be biased by the always-on startup bias circuit, Bias1, and not self-biased from the node PG. While self-biasing is an elegant and resource saving option, it carries the risk of interfering with the startup mechanism. The amplifier is an integral part of the startup mechanism, taking over from the startup circuit once the nodes of the circuit are charged to their steady state voltages. Taking the bias from PG rather than Bias1 (Fig. 3) risked the circuit waking up in the high current mode in some of the corners. The amplitudes of the RO supplies are close to Vth, and thus the frequency outputs need to be level shifted to CMOS levels. Connecting the oscillator directly to a standard level shifter input produces a varactor effect which can degrade the PSR. This is because at the CMOS gates Cgs is highly Vdd dependent and will affect the oscillator current and frequency according to (5). Thus, any change or noise in the supply is translated into error in temperature. According to simulations this could cause a PSR of over 30 • C/V. To counteract this effect, the Bias2 voltage was implemented, which is Vdd independent, as shown in Fig. 5. The bias is created using a replica of the amplifier circuit which biases a current branch along with the Vdd-independent PG bias. The supply voltage Vmid obeys the following equation: Thus Vmid is very close to the voltages of the RO supplies and can be used to supply buffers which isolate the RO frequencies f1 and f2 from the level shifter inputs, as shown in Fig. 5. The replica had to be implemented, since the tail current bias of the amplifier was insufficient, since its overall bias was Vdd-dependent for reasons mentioned earlier.
A counter block (not shown) was connected to the level shifter output, in which the oscillator's frequency was compared to a reference frequency. An adjustable counter allowed different settings of integration time, effectively enabling an averaging of several consecutive measurements if needed. Integrating the frequency over an extended period reduces the influence of noise of both the RO's and the external reference. Figure 6 and 7 show the simulated oscillator frequencies vs. temperature for the slow and fast oscillators respectively at different process corners. Since the circuit obeys equations (4)-(7), there is not much difference between the corners compared to a standard ring oscillator. We suspect that much of the frequency shift across corners is associated with the resistance of R1 across corners.

III. SIMULATED AND MEASURED RESULTS
The results of transient Monte Carlo (MC) simulations of the slow and fast frequencies are shown in Fig. 8 and Fig. 9 for the TT corner. The steady-state frequency is plotted against temperature. It is observed that both curves are nearly linear with temperature with the slow oscillator having a sensitivity of 0.4MHz/ • C, while the sensitivity of the fast oscillator is 1.45 MHz/ • C. The observed spread in frequency is caused by   random variation, including a combination of branch current, amplifier offsets and the Vth of devices in the oscillators themselves. These qualities apply directly to the resulting variations in output frequency of the oscillators. Fig. 10 shows an internal oscillating node of the slow oscillator plotted against time, along with the simulated Vth voltage under nominal conditions. It is observed that the peak voltage of oscillation is close to Vth indicating that the slow oscillator operates in the subthreshold condition. The fast oscillator will have a larger peak-to-peak voltage, and the  A test chip of the sensor was designed and fabricated in TSMC's 65nm node. The sensor's area is 1850um 2 , not including the counter which was shared among 8 sensors. The area of a 12-bit counter in this technology is 170um 2 . The packaged chip was coupled to an aluminum heat sink with a graphite thermal pad. Measurements were made with the chip and custom test board in a Votsch VT 7004 Test Chamber. The chips' performance was validated and characterized through the standard temperature range of -10 • C through 110 • C. An external 100MHz reference clock was provided and the chip's temperature was monitored using a PT100 temperature probe embedded into the aluminum heat sink and monitored using a 4 terminal sensing setup. The PT100 resistive thermal sensor, shaped as a 300mm long rod, was calibrated by the manufacturer such that it provides an accuracy of 20mK. It was placed through an opening in the oven into the heat sink such that its position in the heat sink was very close to the chip (within 1mm). The opening was blocked from the outside by a seal which prevented the heat from escaping. During the chip's measurement, the PT100 was sampled several times to determine the temperature, and to ensure the temperature at the chip was stable. The experimental setup is shown in Fig. 11. A total of 35 sensors, over 5 dies, were VOLUME 8, 2020  measured in order to evaluate the statistical performance of the sensor. A die photo and the sensor's layout are shown in Fig. 12.
The measured output codes of the counters vs. temperature are shown in Fig. 13 and Fig. 14 for the slow and fast codes   respectively. Both frequencies behave approximately linearly as a function of temperature, similar to the simulated results. The fast oscillator was found to be more sensitive to noise and variation and thus less accurate due to the smaller overall oscillator size and low capacitive load on the oscillator nodes. The results hereafter are thus based upon the slow oscillator's performance. Fig. 15 shows the measured sensor error vs. temperature for 2-point calibration for 35 sensors. The curvature of the sensor is apparent from this figure. This curvature can be nulled by applying a 2nd order polynomial fix as is done in the prior-art [9], [10]. In a production environment, the curvature would be measured across several hundred units and then applied to the rest of the lot (millions). Applying the 2nd order  curvature correction yields Fig. 16, which exhibits approximately a 50% reduction in error at the center of the spectrum. The measured 3-sigma error is ±2.5 • C over the range, which is within the required specification for CPUs [2]. Note, that the calibration points used are typical for products [2]. Usually the lower temperature would be used for wafer level testing, while the higher temperature would be the maximum temperature of operation, at which point the sensor would indicate that the chip should throttle.  The circuits resilience to changes in the supply voltage is shown in Fig. 17. The measured sensor code error relative to the nominal voltage (1.2V) is plotted for several temperatures. The worst case PSR error is 2.4 • C/V. By measuring a sensor continuously over time, the noise behavior of a typical sensor can be evaluated. This is shown in Fig. 18 for a repeated measurements at different temperatures. It is observed that the sensor toggles between ±1 LSB, which is 400mK in this case. In products it is possible to take a fast initial measurement, and then improve the resolution by VOLUME 8, 2020   taking a moving average (MA) of subsequent measurements [2]. This is shown in Fig. 19 for moving averages of 2,4, and 8 measurements. Fig. 20 exhibits the RMS resolution of the sensor vs. MA, which shows that the error is reduced for higher MA.  Figures 21 and 22 show graphical comparisons of the sensor's resolution FOM and accuracy vs. area respectively (borrowed with permission from [7]). A comparison of the sensor's performance to recent small sensors is presented in Table 1. The sensor is compared to small sensors which can meet the CPU specification (< 0.02mm 2 [2]). This sensor is one of the smallest and fastest sensors reported. Its accuracy and resolution are comparable to most other small sensors and meet the CPU specification. The conversion energy is one of the lowest amongst the small sensors. It is also possible to use the ring-oscillator and level shifter as remote sensing elements by extending two analog wires from the rest of the readout circuit. Since the hot-spot areas are very congested, this enables the readout circuit to be positioned at a more remote location, with only the sensing element at the hot-spot. Among the sensors shown in Table 1, this sensor has the smallest such sensing element. The achieved resolution FOM of the sensor is among the best for sensors smaller than 5000µm 2 , since the conversion time is very fast (10µs). This conversion time may enable a quick temperature reading when the chip exits deep sleep states. It can also be used to duty-cycle the temperature readings to save power. We are planning to add offset cancellation circuitry to this design in future revisions. This could facilitate higher accuracy while potentially saving area.

IV. DISCUSSION AND CONCLUSIONS
A sensor is shown which uses ring-oscillators instead of BJT's in a bandgap-like circuit structure. A nearly linear temperature dependent frequency is observed. The sensor benefits from the compact area, low power, and high-resolution characteristics of a digital circuit. It also exhibits relative process independence and power supply rejection of an analog circuit. The sensor excels both in size, 1850µm 2 , and conversion time, 10µs. It achieves a resolution FOM of 96pJ-K 2 , which is highly competitive among the compact sensors. These features make the sensor attractive for dense thermal monitoring in IC products.